LeMay, Michael via llvm-dev
2017-Feb-08 00:05 UTC
[llvm-dev] [RFC] Using Intel MPX to harden SafeStack
Hi, I previously posted about using 32-bit X86 segmentation to harden SafeStack: http://lists.llvm.org/pipermail/llvm-dev/2016-May/100346.html That involves lowering the limits of the DS and ES segments that are used for ordinary data accesses while leaving the limit for SS, the stack segment, set to its maximum value. The safe stacks were clustered above the limits of DS and ES. Thus, by directing individual memory operands to either DS/ES or SS, stray pointer writes that could otherwise corrupt the safe stack would be blocked by the segmentation checks. My proposed compiler modifications inspect memory operands to determine whether the compiler (or more specifically, the SafeStack pass) intends that they be allowed to access the safe stack. It then inserts segment override prefixes and related instructions as necessary. I submitted patches today to implement an analogous idea in 64-bit mode using Intel MPX. MPX can be used both to enforce fine-grained per-object bounds and coarse-grained bounds. My patches use it for the latter purpose, so they make no use of the table-related instructions in MPX. The runtime library [1] simply initializes one bounds register, BND0, to have an upper bound that is set below all safe stacks and above all ordinary data. A pre-isel patch instruments stores that are not authorized to access the safe stack by preceding each such instruction with a BNDCU instruction. That checks whether the following store accesses memory that is entirely below the upper bound in BND0 [2]. Loads are not instrumented, since the purpose of the checks is only to help prevent corruption of the safe stacks. Authorized safe stack accesses are not instrumented, since the SafeStack pass is responsible for verifying that such accesses do not corrupt the safe stack. The default handler is used when a bound check fails, which results in the program being terminated on the systems where I have performed tests. To reduce the performance and size overhead from instrumenting the code, both the pre-isel patch and a pre-emit patch elide various checks [2, 3]. The pre-isel patch uses techniques derived from the BoundsChecking pass to statically verify that some stores are safe so that the checks for those stores can be elided. The pre-emit patch compares the bound checks in each basic block and combines those that are redundant. The contents of BND0 are static, so a successful check of a higher address implies that any check of a lower address will also succeed. Thus, if a check of a higher address precedes a check of a lower address in a basic block, the latter check can be erased. On the other hand, if a check of a lower address precedes a check of a higher address in a basic block, then the latter check can still be erased, but it is also necessary to use the higher address in the remaining check. However, my pass is only able to statically compare certain addresses, which limits the checks that can be combined. For example, if two addresses use the same base and index registers and scale along with a simple displacement, then my pass may be able to compare them. However, if either the base or the index register is redefined by an instruction between the two checks, then my pass is currently unable to compare the two addresses. Incidentally, the pre-emit pass uses the getAddressFromInstr routine, which needs to be patched to properly handle certain global variable addresses [7]. The pre-emit pass also erases checks for addresses that do not specify a base or index register as well as those that specify a RIP-relative offset with no index register. I think that the source code would need to be quite malformed to corrupt safe stacks using such address types. Additional optimizations may be possible in the future, such as lifting checks out of loops or otherwise performing inter-basic block analysis to identify additional redundant checks. The pre-emit pass also erases bound checks for accesses relative to a non-default segment, such as thread-local accesses relative to FS. Linear addresses for thread-local accesses are computed with a non-zero segment base address, so it would be necessary to check thread-local effective addresses against a bounds register with an upper bound that is adjusted down to account for that rather than the bounds register that is used for checking other accesses. However, negative offsets are sometimes used for thread-local accesses, which are treated as very large unsigned effective addresses. Checking them would require them to first be added to the base of the thread-local storage segment. Developers can use the -mseparate-stack-seg flag to enable instrumentation of functions that have the SafeStack attribute [4, 6]. That flag also causes the runtime library to be linked [5]. Due to BND0 being treated as per-thread state, the runtime library picks an initial BND0 upper bound when the program starts that is arbitrarily set to be 256MiB below the base of the initial (safe) stack. If and when that 256MiB space becomes overfilled by safe stacks, the program will crash due to a failing CHECK_GE statement in the runtime library. Without this check, an adversary may be able to modify a variable in the runtime library recording the address of the most-recently allocated safe stack to cause safe stacks to be allocated in vulnerable locations. An alternative approach to avoid that limitation could be to store that variable above the bound checked by the instrumented code. This could help to prevent adversaries from forcing safe stacks to be allocated at vulnerable locations while still allowing the program to keep running even when its safe stacks protrude below the bound. Of course, the protruding portions of the safe stacks would be vulnerable. Another alternative could be to treat the MPX bounds registers as per-program state rather than per-thread state. BND0 could then be adjusted downwards as necessary when new safe stacks are allocated. The runtime library currently ignores all attribute settings passed to pthread_create. It allocates a safe stack itself at a high address. Furthermore, the runtime library currently does not support expansion of the safe stacks, nor does it free the safe stacks that it allocates. If the BND0 upper bound happens to fall below some ordinary data, then attempted accesses to that data by instrumented stores will violate the associated bound checks. The runtime library checks for MPX support when it initializes, and it falls back to the default (ASLR-based) safe stack protections if MPX is unavailable. However, (inactive) bound check instructions in the program may still impose size and performance overheads. There could conceivably be a situation in which an instrumented program passes a function pointer for an instrumented callback to an uninstrumented library. If that library allocates an object on the (safe) stack and passes its pointer to the callback, then a bound check violation could result. This is due to an assumption in the pass that instruments the code. It assumes that all pointer arguments except those with the byval or readnone attributes point to the unsafe stack. I think this is a valid assumption when all stack frames correspond to instrumented functions. However, it is not a valid assumption in the scenario described above. Such bound check violations can be avoided by not instrumenting such callbacks as well as any functions to which they pass pointers to any allocations on the safe stack. Comments appreciated. Thanks, Michael [1] [safestack] Add runtime support for MPX-based hardening: https://reviews.llvm.org/D29657 [2] [X86] Add X86SafeStackBoundsChecking pass: https://reviews.llvm.org/D29649 [3] [X86] Add X86SafeStackBoundsCheckingCombiner pass: https://reviews.llvm.org/D29652 [4] [X86] Add -mseparate-stack-seg: https://reviews.llvm.org/D17092 [5] [X86] Link safestacksepseg runtime: https://reviews.llvm.org/D29655 [6] [X86] Add separate-stack-seg feature: https://reviews.llvm.org/D29646 [7] [x86] Fix getAddressFromInstr: https://reviews.llvm.org/D27169
Kostya Serebryany via llvm-dev
2017-Feb-08 00:11 UTC
[llvm-dev] [RFC] Using Intel MPX to harden SafeStack
(explicitly CC-ing more folks, just in case) On Tue, Feb 7, 2017 at 4:05 PM, LeMay, Michael via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi, > > I previously posted about using 32-bit X86 segmentation to harden > SafeStack: http://lists.llvm.org/pipermail/llvm-dev/2016-May/100346.html > That involves lowering the limits of the DS and ES segments that are used > for ordinary data accesses while leaving the limit for SS, the stack > segment, set to its maximum value. The safe stacks were clustered above > the limits of DS and ES. Thus, by directing individual memory operands to > either DS/ES or SS, stray pointer writes that could otherwise corrupt the > safe stack would be blocked by the segmentation checks. My proposed > compiler modifications inspect memory operands to determine whether the > compiler (or more specifically, the SafeStack pass) intends that they be > allowed to access the safe stack. It then inserts segment override > prefixes and related instructions as necessary. > > I submitted patches today to implement an analogous idea in 64-bit mode > using Intel MPX. MPX can be used both to enforce fine-grained per-object > bounds and coarse-grained bounds. My patches use it for the latter > purpose, so they make no use of the table-related instructions in MPX. The > runtime library [1] simply initializes one bounds register, BND0, to have > an upper bound that is set below all safe stacks and above all ordinary > data. A pre-isel patch instruments stores that are not authorized to > access the safe stack by preceding each such instruction with a BNDCU > instruction. That checks whether the following store accesses memory that > is entirely below the upper bound in BND0 [2]. Loads are not instrumented, > since the purpose of the checks is only to help prevent corruption of the > safe stacks. Authorized safe stack accesses are not instrumented, since > the SafeStack pass is responsible for verifying that such accesses do not > corrupt the safe stack. The default handler is used when a bound check > fails, which results in the program being terminated on the systems where I > have performed tests. > > To reduce the performance and size overhead from instrumenting the code, > both the pre-isel patch and a pre-emit patch elide various checks [2, 3]. > The pre-isel patch uses techniques derived from the BoundsChecking pass to > statically verify that some stores are safe so that the checks for those > stores can be elided. The pre-emit patch compares the bound checks in each > basic block and combines those that are redundant. The contents of BND0 > are static, so a successful check of a higher address implies that any > check of a lower address will also succeed. Thus, if a check of a higher > address precedes a check of a lower address in a basic block, the latter > check can be erased. On the other hand, if a check of a lower address > precedes a check of a higher address in a basic block, then the latter > check can still be erased, but it is also necessary to use the higher > address in the remaining check. However, my pass is only able to > statically compare certain addresses, which limits the checks that can be > combined. For example, if two addresses use the same base and index > registers and scale along with a simple displacement, then my pass may be > able to compare them. However, if either the base or the index register is > redefined by an instruction between the two checks, then my pass is > currently unable to compare the two addresses. Incidentally, the pre-emit > pass uses the getAddressFromInstr routine, which needs to be patched to > properly handle certain global variable addresses [7]. The pre-emit pass > also erases checks for addresses that do not specify a base or index > register as well as those that specify a RIP-relative offset with no index > register. I think that the source code would need to be quite malformed to > corrupt safe stacks using such address types. Additional optimizations may > be possible in the future, such as lifting checks out of loops or otherwise > performing inter-basic block analysis to identify additional redundant > checks. > > The pre-emit pass also erases bound checks for accesses relative to a > non-default segment, such as thread-local accesses relative to FS. Linear > addresses for thread-local accesses are computed with a non-zero segment > base address, so it would be necessary to check thread-local effective > addresses against a bounds register with an upper bound that is adjusted > down to account for that rather than the bounds register that is used for > checking other accesses. However, negative offsets are sometimes used for > thread-local accesses, which are treated as very large unsigned effective > addresses. Checking them would require them to first be added to the base > of the thread-local storage segment. > > Developers can use the -mseparate-stack-seg flag to enable instrumentation > of functions that have the SafeStack attribute [4, 6]. That flag also > causes the runtime library to be linked [5]. > > Due to BND0 being treated as per-thread state, the runtime library picks > an initial BND0 upper bound when the program starts that is arbitrarily set > to be 256MiB below the base of the initial (safe) stack. If and when that > 256MiB space becomes overfilled by safe stacks, the program will crash due > to a failing CHECK_GE statement in the runtime library. Without this > check, an adversary may be able to modify a variable in the runtime library > recording the address of the most-recently allocated safe stack to cause > safe stacks to be allocated in vulnerable locations. An alternative > approach to avoid that limitation could be to store that variable above the > bound checked by the instrumented code. This could help to prevent > adversaries from forcing safe stacks to be allocated at vulnerable > locations while still allowing the program to keep running even when its > safe stacks protrude below the bound. Of course, the protruding portions > of the safe stacks would be vulnerable. Another alternative could be to > treat the MPX bounds registers as per-program state rather than per-thread > state. BND0 could then be adjusted downwards as necessary when new safe > stacks are allocated. > > The runtime library currently ignores all attribute settings passed to > pthread_create. It allocates a safe stack itself at a high address. > Furthermore, the runtime library currently does not support expansion of > the safe stacks, nor does it free the safe stacks that it allocates. > > If the BND0 upper bound happens to fall below some ordinary data, then > attempted accesses to that data by instrumented stores will violate the > associated bound checks. > > The runtime library checks for MPX support when it initializes, and it > falls back to the default (ASLR-based) safe stack protections if MPX is > unavailable. However, (inactive) bound check instructions in the program > may still impose size and performance overheads. > > There could conceivably be a situation in which an instrumented program > passes a function pointer for an instrumented callback to an uninstrumented > library. If that library allocates an object on the (safe) stack and > passes its pointer to the callback, then a bound check violation could > result. This is due to an assumption in the pass that instruments the > code. It assumes that all pointer arguments except those with the byval or > readnone attributes point to the unsafe stack. I think this is a valid > assumption when all stack frames correspond to instrumented functions. > However, it is not a valid assumption in the scenario described above. > Such bound check violations can be avoided by not instrumenting such > callbacks as well as any functions to which they pass pointers to any > allocations on the safe stack. > > Comments appreciated. > > Thanks, > Michael > > [1] [safestack] Add runtime support for MPX-based hardening: > https://reviews.llvm.org/D29657 > [2] [X86] Add X86SafeStackBoundsChecking pass: https://reviews.llvm.org/ > D29649 > [3] [X86] Add X86SafeStackBoundsCheckingCombiner pass: > https://reviews.llvm.org/D29652 > [4] [X86] Add -mseparate-stack-seg: https://reviews.llvm.org/D17092 > [5] [X86] Link safestacksepseg runtime: https://reviews.llvm.org/D29655 > [6] [X86] Add separate-stack-seg feature: https://reviews.llvm.org/D29646 > [7] [x86] Fix getAddressFromInstr: https://reviews.llvm.org/D27169 > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170207/c6f802fd/attachment.html>
Kostya Serebryany via llvm-dev
2017-Feb-08 04:02 UTC
[llvm-dev] [RFC] Using Intel MPX to harden SafeStack
On Tue, Feb 7, 2017 at 4:05 PM, LeMay, Michael via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi, > > I previously posted about using 32-bit X86 segmentation to harden > SafeStack: http://lists.llvm.org/pipermail/llvm-dev/2016-May/100346.html > That involves lowering the limits of the DS and ES segments that are used > for ordinary data accesses while leaving the limit for SS, the stack > segment, set to its maximum value. The safe stacks were clustered above > the limits of DS and ES. Thus, by directing individual memory operands to > either DS/ES or SS, stray pointer writes that could otherwise corrupt the > safe stack would be blocked by the segmentation checks. My proposed > compiler modifications inspect memory operands to determine whether the > compiler (or more specifically, the SafeStack pass) intends that they be > allowed to access the safe stack. It then inserts segment override > prefixes and related instructions as necessary. > > I submitted patches today to implement an analogous idea in 64-bit mode > using Intel MPX. MPX can be used both to enforce fine-grained per-object > bounds and coarse-grained bounds. My patches use it for the latter > purpose, so they make no use of the table-related instructions in MPX.That's a relief :)> The runtime library [1] simply initializes one bounds register, BND0, to > have an upper bound that is set below all safe stacks and above all > ordinary data.So you enforce that safe stacks and other data are not intermixed, as you explain below. What are the downsides? Performance? Compatibility?> A pre-isel patch instruments stores that are not authorized to access the > safe stack by preceding each such instruction with a BNDCU instruction.My understanding is that BNDCU is the cheapest possible instruction, just like XOR or ADD, so the overhead should be relatively small. Still my guesstimate would be >= 5% since stores are very numerous. And such overhead will be on top of whatever overhead SafeStack has. Do you have any measurements to share?> That checks whether the following store accesses memory that is entirely > below the upper bound in BND0 [2]. Loads are not instrumented, since the > purpose of the checks is only to help prevent corruption of the safe > stacks. Authorized safe stack accesses are not instrumented, since the > SafeStack pass is responsible for verifying that such accesses do not > corrupt the safe stack. The default handler is used when a bound check > fails, which results in the program being terminated on the systems where I > have performed tests. > > To reduce the performance and size overhead from instrumenting the code, > both the pre-isel patch and a pre-emit patch elide various checks [2, 3]. > The pre-isel patch uses techniques derived from the BoundsChecking pass to > statically verify that some stores are safe so that the checks for those > stores can be elided. The pre-emit patch compares the bound checks in each > basic block and combines those that are redundant. The contents of BND0 > are static, so a successful check of a higher address implies that any > check of a lower address will also succeed. Thus, if a check of a higher > address precedes a check of a lower address in a basic block, the latter > check can be erased. On the other hand, if a check of a lower address > precedes a check of a higher address in a basic block, then the latter > check can still be erased, but it is also necessary to use the higher > address in the remaining check. However, my pass is only able to > statically compare certain addresses, which limits the checks that can be > combined. For example, if two addresses use the same base and index > registers and scale along with a simple displacement, then my pass may be > able to compare them. However, if either the base or the index register is > redefined by an instruction between the two checks, then my pass is > currently unable to compare the two addresses.The usual question in such situation: how do we verify that the optimizations are not too optimistic? If we remove a check that is not in fact redundant, we will never know, until clever folks use it for an exploit (and maybe not even then).> Incidentally, the pre-emit pass uses the getAddressFromInstr routine, > which needs to be patched to properly handle certain global variable > addresses [7]. The pre-emit pass also erases checks for addresses that do > not specify a base or index register as well as those that specify a > RIP-relative offset with no index register. I think that the source code > would need to be quite malformed to corrupt safe stacks using such address > types. Additional optimizations may be possible in the future, such as > lifting checks out of loops or otherwise performing inter-basic block > analysis to identify additional redundant checks. > > The pre-emit pass also erases bound checks for accesses relative to a > non-default segment, such as thread-local accesses relative to FS. Linear > addresses for thread-local accesses are computed with a non-zero segment > base address, so it would be necessary to check thread-local effective > addresses against a bounds register with an upper bound that is adjusted > down to account for that rather than the bounds register that is used for > checking other accesses. However, negative offsets are sometimes used for > thread-local accesses, which are treated as very large unsigned effective > addresses. Checking them would require them to first be added to the base > of the thread-local storage segment. > > Developers can use the -mseparate-stack-seg flag to enable instrumentation > of functions that have the SafeStack attribute [4, 6]. That flag also > causes the runtime library to be linked [5]. > > Due to BND0 being treated as per-thread state, the runtime library picks > an initial BND0 upper bound when the program starts that is arbitrarily set > to be 256MiB below the base of the initial (safe) stack. If and when that > 256MiB space becomes overfilled by safe stacks, the program will crash due > to a failing CHECK_GE statement in the runtime library. Without this > check, an adversary may be able to modify a variable in the runtime library > recording the address of the most-recently allocated safe stack to cause > safe stacks to be allocated in vulnerable locations. An alternative > approach to avoid that limitation could be to store that variable above the > bound checked by the instrumented code. This could help to prevent > adversaries from forcing safe stacks to be allocated at vulnerable > locations while still allowing the program to keep running even when its > safe stacks protrude below the bound. Of course, the protruding portions > of the safe stacks would be vulnerable. Another alternative could be to > treat the MPX bounds registers as per-program state rather than per-thread > state. BND0 could then be adjusted downwards as necessary when new safe > stacks are allocated. > > The runtime library currently ignores all attribute settings passed to > pthread_create. It allocates a safe stack itself at a high address. > Furthermore, the runtime library currently does not support expansion of > the safe stacks, nor does it free the safe stacks that it allocates. > > If the BND0 upper bound happens to fall below some ordinary data, then > attempted accesses to that data by instrumented stores will violate the > associated bound checks. > > The runtime library checks for MPX support when it initializes, and it > falls back to the default (ASLR-based) safe stack protections if MPX is > unavailable. However, (inactive) bound check instructions in the program > may still impose size and performance overheads. > > There could conceivably be a situation in which an instrumented program > passes a function pointer for an instrumented callback to an uninstrumented > library. If that library allocates an object on the (safe) stack and > passes its pointer to the callback, then a bound check violation could > result. This is due to an assumption in the pass that instruments the > code. It assumes that all pointer arguments except those with the byval or > readnone attributes point to the unsafe stack. I think this is a valid > assumption when all stack frames correspond to instrumented functions. > However, it is not a valid assumption in the scenario described above. > Such bound check violations can be avoided by not instrumenting such > callbacks as well as any functions to which they pass pointers to any > allocations on the safe stack. > > Comments appreciated. > > Thanks, > Michael > > [1] [safestack] Add runtime support for MPX-based hardening: > https://reviews.llvm.org/D29657 > [2] [X86] Add X86SafeStackBoundsChecking pass: https://reviews.llvm.org/ > D29649 > [3] [X86] Add X86SafeStackBoundsCheckingCombiner pass: > https://reviews.llvm.org/D29652 > [4] [X86] Add -mseparate-stack-seg: https://reviews.llvm.org/D17092 > [5] [X86] Link safestacksepseg runtime: https://reviews.llvm.org/D29655 > [6] [X86] Add separate-stack-seg feature: https://reviews.llvm.org/D29646 > [7] [x86] Fix getAddressFromInstr: https://reviews.llvm.org/D27169 > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >Thanks! --kcc -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170207/cde9479f/attachment.html>
LeMay, Michael via llvm-dev
2017-Feb-09 00:51 UTC
[llvm-dev] [RFC] Using Intel MPX to harden SafeStack
On 2/7/2017 20:02, Kostya Serebryany wrote:> On Tue, Feb 7, 2017 at 4:05 PM, LeMay, Michael via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > >...> > > The runtime library [1] simply initializes one bounds register, > BND0, to have an upper bound that is set below all safe stacks and > above all ordinary data. > > > So you enforce that safe stacks and other data are not intermixed, as > you explain below. > What are the downsides? Performance? Compatibility?I think the main downside is that only a limited number of threads can be created before the safe stacks would protrude below the bound. Extending the proposed runtime library to deallocate safe stacks when they are no longer needed may help with this. The safe stacks are also prevented from expanding, since they are allocated contiguously at high addresses.> A pre-isel patch instruments stores that are not authorized to > access the safe stack by preceding each such instruction with a > BNDCU instruction. > > > My understanding is that BNDCU is the cheapest possible instruction, > just like XOR or ADD, > so the overhead should be relatively small. > Still my guesstimate would be >= 5% since stores are very numerous. > And such overhead will be on top of whatever overhead SafeStack has. > Do you have any measurements to share?I'm working on getting approval to release some benchmark results.> That checks whether the following store accesses memory that is > entirely below the upper bound in BND0 [2]. Loads are not > instrumented, since the purpose of the checks is only to help > prevent corruption of the safe stacks. Authorized safe stack > accesses are not instrumented, since the SafeStack pass is > responsible for verifying that such accesses do not corrupt the > safe stack. The default handler is used when a bound check fails, > which results in the program being terminated on the systems where > I have performed tests. > > To reduce the performance and size overhead from instrumenting the > code, both the pre-isel patch and a pre-emit patch elide various > checks [2, 3]. The pre-isel patch uses techniques derived from > the BoundsChecking pass to statically verify that some stores are > safe so that the checks for those stores can be elided. The > pre-emit patch compares the bound checks in each basic block and > combines those that are redundant. The contents of BND0 are > static, so a successful check of a higher address implies that any > check of a lower address will also succeed. Thus, if a check of a > higher address precedes a check of a lower address in a basic > block, the latter check can be erased. On the other hand, if a > check of a lower address precedes a check of a higher address in a > basic block, then the latter check can still be erased, but it is > also necessary to use the higher address in the remaining check. > However, my pass is only able to statically compare certain > addresses, which limits the checks that can be combined. For > example, if two addresses use the same base and index registers > and scale along with a simple displacement, then my pass may be > able to compare them. However, if either the base or the index > register is redefined by an instruction between the two checks, > then my pass is currently unable to compare the two addresses. > > > The usual question in such situation: how do we verify that the > optimizations are not too optimistic? > If we remove a check that is not in fact redundant, we will never > know, until clever folks use it for an exploit (and maybe not even then).The pre-emit pass is able to verify that some checks are redundant by inspecting the operands used to specify an address. For example, consider the following test for the pre-emit pass: 0: %rax = MOVSX64rr32 killed %edi 1: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8, %rax, @x + 4, _ ; CHECK: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8, %rax, @x + 8, _ 2: MOV32mi _, 8, %rax, @x, _, 0 3: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8, %rax, @x + 8, _ ; CHECK-NOT: INLINEASM $"bndcu $0, %bnd0", 8, 196654, _, 8, %rax, @x + 8, _ 4: MOV32mi _, 8, killed %rax, @x + 4, _, 0 The pass verifies that the only difference between the memory operands in instructions 1 and 3 is that they use a different offset from the global variable, so they can be combined. The pass also tracks register definitions, so it would know not to combine the checks in this example if there had been an instruction that redefined %rax between instructions 1 and 3. On the other hand, some of the optimizations described in the next couple of paragraphs may be optimistic, so I especially welcome feedback on them: ...> The pre-emit pass also erases checks for addresses that do not > specify a base or index register as well as those that specify a > RIP-relative offset with no index register. I think that the > source code would need to be quite malformed to corrupt safe > stacks using such address types. >...> > The pre-emit pass also erases bound checks for accesses relative > to a non-default segment, such as thread-local accesses relative > to FS. Linear addresses for thread-local accesses are computed > with a non-zero segment base address, so it would be necessary to > check thread-local effective addresses against a bounds register > with an upper bound that is adjusted down to account for that > rather than the bounds register that is used for checking other > accesses. However, negative offsets are sometimes used for > thread-local accesses, which are treated as very large unsigned > effective addresses. Checking them would require them to first be > added to the base of the thread-local storage segment. >... Thanks, Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170208/3c7e4f66/attachment.html>
LeMay, Michael via llvm-dev
2017-Feb-18 01:27 UTC
[llvm-dev] [RFC] Using Intel MPX to harden SafeStack
On 2/7/2017 20:02, Kostya Serebryany wrote:> ... > > My understanding is that BNDCU is the cheapest possible instruction, > just like XOR or ADD, > so the overhead should be relatively small. > Still my guesstimate would be >= 5% since stores are very numerous. > And such overhead will be on top of whatever overhead SafeStack has. > Do you have any measurements to share? >Here are estimated SPECint_base2006 component runtimes for some relevant test configurations: Runtime in seconds: +--------------+---------+---------+---------+-------+ |Benchmark|Unpatched|Unpatched|Patched|MPX| ||Base|SafeStack|SafeStack|| +--------------+---------+---------+---------+-------+ |400.perlbench |430.82|443.07|442.42|456.34 | +--------------+---------+---------+---------+-------+ |401.bzip2|711.43|716.59|717.35|750.06 | +--------------+---------+---------+---------+-------+ |403.gcc|333.76|334.11|334.95|336.13 | +--------------+---------+---------+---------+-------+ |429.mcf|371.48|375.75|373.50|377.93 | +--------------+---------+---------+---------+-------+ |445.gobmk|677.80|686.12|685.50|702.87 | +--------------+---------+---------+---------+-------+ |456.hmmer|534.94|533.68|534.37|553.40 | +--------------+---------+---------+---------+-------+ |458.sjeng|633.69|641.21|641.81|655.94 | +--------------+---------+---------+---------+-------+ |462.libquantum|362.82|367.00|367.38|382.14 | +--------------+---------+---------+---------+-------+ |464.h264ref|701.37|682.13|683.41|699.93 | +--------------+---------+---------+---------+-------+ |471.omnetpp|397.04|407.38|407.33|411.36 | +--------------+---------+---------+---------+-------+ |473.astar|611.51|610.46|610.19|624.78 | +--------------+---------+---------+---------+-------+ |483.xalancbmk |291.66|295.61|296.42|298.29 | +--------------+---------+---------+---------+-------+ |SUM |6058.32|6093.10|6094.62|6249.16| +--------------+---------+---------+---------+-------+ These runtimes are estimates as benchmark runs for research purposes built with patched/experimental compilers cannot be benchmark compliant. Compilation flags that aren't yet fully documented also can not be compliant. Percentage changes in runtime relative to Unpatched Base: +--------------+---------+---------+-----+ |Benchmark|Unpatched|Patched|MPX| ||SafeStack|SafeStack|| +--------------+---------+---------+-----+ |400.perlbench |2.84|2.69|5.93 | +--------------+---------+---------+-----+ |401.bzip2|0.73|0.83|5.43 | +--------------+---------+---------+-----+ |403.gcc|0.10|0.36|0.71 | +--------------+---------+---------+-----+ |429.mcf|1.15|0.54|1.74 | +--------------+---------+---------+-----+ |445.gobmk|1.23|1.14|3.70 | +--------------+---------+---------+-----+ |456.hmmer|-0.24|-0.11|3.45 | +--------------+---------+---------+-----+ |458.sjeng|1.19|1.28|3.51 | +--------------+---------+---------+-----+ |462.libquantum|1.15|1.26|5.32 | +--------------+---------+---------+-----+ |464.h264ref|-2.74|-2.56|-0.21| +--------------+---------+---------+-----+ |471.omnetpp|2.60|2.59|3.61 | +--------------+---------+---------+-----+ |473.astar|-0.17|-0.21|2.17 | +--------------+---------+---------+-----+ |483.xalancbmk |1.35|1.63|2.27 | +--------------+---------+---------+-----+ |SUM |0.57|0.60|3.15 | +--------------+---------+---------+-----+ These measurements were collected on an Intel NUC6i5SY with an Intel Core i5-6260U CPU and 32G RAM running Clear Linux 13330. Intel Hyper-Threading, Intel Turbo Boost, and the LAN were all disabled. I used SPEC CPU2006 v1.2 and started the Clang/LLVM port from the gcc 4.6 Linux x86 example file included in the SPEC CPU 2006 kit. Here is the legend for the various test configurations: - Unpatched Base: Unpatched compiler with SafeStack disabled.This is the reference configuration. - Unpatched SafeStack: Unpatched compiler with SafeStack enabled. - Patched SafeStack: Patched compiler with SafeStack enabled.However, MPX-based hardening is not enabled in this configuration.This configuration is intended to show the effect of the Compiler-RT patches on programs that do not enable MPX-based hardening. - MPX: Patched compiler with MPX-hardened SafeStack enabled. The unpatched compiler was built from the following SVN IDs: - LLVM: 292171 from January 16, 2017 - Clang: 292141 from January 16, 2017 - Compiler-RT: 291346 from January 7, 2017 The patched compiler was built with the current posted versions of my patches applied on top of the SVN IDs listed above. The following compiler settings in the SPEC CPU2006 cfg files were used for each configuration: COPTIMIZE: - Unpatched Base: -std=gnu89 -O2 -fno-strict-aliasing -march=skylake -mtune=skylake - Unpatched/Patched SafeStack: -std=gnu89 -O2 -fno-strict-aliasing -march=skylake -mtune=skylake -fsanitize=safe-stack - MPX: -std=gnu89 -O2 -fno-strict-aliasing -march=skylake -mtune=skylake -mseparate-stack-seg -fsanitize=safe-stack CXXOPTIMIZE: - Unpatched Base: -O2 -fno-strict-aliasing -march=skylake -mtune=skylake - Unpatched/Patched SafeStack: -O2 -fno-strict-aliasing -march=skylake -mtune=skylake -fsanitize=safe-stack - MPX: -O2 -fno-strict-aliasing -march=skylake -mtune=skylake -mseparate-stack-seg -fsanitize=safe-stack The FOPTIMIZE settings are irrelevant, since none of the SPECint tests use Fortran. Here are measurements of the absolute sizes of the .text sections for the programs as well as percentage changes in those sizes: .text section size in bytes: +--------------+---------+---------+---------+-------+ |Benchmark|Unpatched|Unpatched|Patched|MPX| ||Base|SafeStack|SafeStack|| +--------------+---------+---------+---------+-------+ |400.perlbench |884769|1003041|1003233|1131769| +--------------+---------+---------+---------+-------+ |401.bzip2|79393|175297|175489|235577 | +--------------+---------+---------+---------+-------+ |403.gcc|2420209|2545041|2545233|2727913| +--------------+---------+---------+---------+-------+ |429.mcf|10977|105345|105537|155705 | +--------------+---------+---------+---------+-------+ |445.gobmk|633953|743585|743777|823993 | +--------------+---------+---------+---------+-------+ |456.hmmer|258593|358033|358225|432249 | +--------------+---------+---------+---------+-------+ |458.sjeng|96593|192929|193121|251545 | +--------------+---------+---------+---------+-------+ |462.libquantum|32441|127065|127257|177545 | +--------------+---------+---------+---------+-------+ |464.h264ref|539713|638705|638897|736729 | +--------------+---------+---------+---------+-------+ |471.omnetpp|403521|527345|527537|597801 | +--------------+---------+---------+---------+-------+ |473.astar|31169|126225|126417|178105 | +--------------+---------+---------+---------+-------+ |483.xalancbmk |2358241|2725921 |2726113|2936841| +--------------+---------+---------+---------+-------+ Percentage changes in .text section size relative to Unpatched Base: +--------------+---------+---------+-------+ |Benchmark|Unpatched|Patched|MPX| ||SafeStack|SafeStack|| +--------------+---------+---------+-------+ |400.perlbench |13.37|13.39|27.92| +--------------+---------+---------+-------+ |401.bzip2|120.80|121.04|196.72 | +--------------+---------+---------+-------+ |403.gcc|5.16|5.17|12.71| +--------------+---------+---------+-------+ |429.mcf|859.69|861.44|1318.47| +--------------+---------+---------+-------+ |445.gobmk|17.29|17.32|29.98| +--------------+---------+---------+-------+ |456.hmmer|38.45|38.53|67.15| +--------------+---------+---------+-------+ |458.sjeng|99.73|99.93|160.42 | +--------------+---------+---------+-------+ |462.libquantum|291.68|292.27|447.29 | +--------------+---------+---------+-------+ |464.h264ref|18.34|18.38|36.50| +--------------+---------+---------+-------+ |471.omnetpp|30.69|30.73|48.15| +--------------+---------+---------+-------+ |473.astar|304.97|305.59|471.42 | +--------------+---------+---------+-------+ |483.xalancbmk |15.59|15.60|24.54| +--------------+---------+---------+-------+ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170217/97ffc8cd/attachment.html>
Seemingly Similar Threads
- [RFC] Using Intel MPX to harden SafeStack
- [LLVMdev] Intel Memory Protection Extensions (and types question)
- Intel MPX support (instrumentation pass similar to gcc's Pointer Checker)
- [LLVMdev] Intel Memory Protection Extensions (and types question)
- [LLVMdev] Intel Memory Protection Extensions (and types question)