Bill Wendling via llvm-dev
2020-Aug-12 21:44 UTC
[llvm-dev] [RFC] Zeroing Caller Saved Regs
On Mon, Aug 10, 2020 at 3:34 AM David Chisnall <David.Chisnall at cl.cam.ac.uk> wrote:> > Thanks, > > On 07/08/2020 23:28, Kees Cook wrote: > > On Fri, Aug 7, 2020 at 1:18 AM David Chisnall > > <David.Chisnall at cl.cam.ac.uk> wrote: > >> I think it would be useful for the discussion to have a clear threat model that this intends to defend against and a rough analysis of the security benefits that this is believed to bring. > > > > I view this as being even more about a ROP defense. Dealing with spill > > slots is, IMO, a separate issue, more related to the auto-var-init > > work (though that would be stack erasure on function exit, rather than > > entry, which addresses a different set of issues). I think this thread > > from the GCC list has some good details on the ROP defense: > > > > https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551607.html > > This link gives two motivations: > > 1. Reducing information leak (which I find unconvincing, because there's > a lot more left on the stack than in caller-save registers). > 2. Reducing ROP gadgets. > > Unfortunately, for claim 2 they cite a paper that is behind a paywall, > so I can't easily see what that's doing and I'll have to guess what the > paper says: > > Caller-save registers are intuitively useful in the first gadget in a > ROP sequence, because the current frame will have put values into them > (and so they are most likely to hold attacker-controlled values). I can > imagine quite easily a paper that shows that you break the first gadget > in a chain with this mitigation. > > It's possible that it would also significantly reduce the number of > total gadgets if each ret is preceeded by the zeroing sequence, > effectively denying the ability for the attacker to use these registers. > Unfortunately, to be able to make arbitrary calls they would just need > one unguarded forward control-flow edge that loaded a function pointer > and its arguments from the stack, and I can't imagine that such a gadget > is absent from most nontrivial codebases. I'd like to see an analysis > of the gadgets remaining when this mitigation is used. > > I don't object to adding a flag that makes the Linux kernel slower but > if it is being advertised as a security feature then I would like to see > some evidence that it does something other than require automated attack > tools pick a different set of gadgets to use. >After reading the paper they link to, I'm rethinking this feature. :-)>From what I can gather from the paper, they use a tool to determinewhich scratch (caller saved) registers are used in a function call. They then use some type of instrumentation to zero out those scratch registers. This can apparently break the change. For example, in line 17 below, RDI will be zeroed out as is RSI in line 19: 1: p = '' 2: p += pack('<Q', 0x0000000000401627) # pop rsi ; ret 3: p += pack('<Q', 0x00000000006ca080) # @ .data 4: p += pack('<Q', 0x00000000004784d6) # pop rax ; pop rdx ; pop rbx ; ret 5: p += '/bin//sh' 6: p += pack('<Q', 0x4141414141414141) # padding 7: p += pack('<Q', 0x4141414141414141) # padding 8: p += pack('<Q', 0x0000000000473f81) # mov qword ptr [rsi], rax ; ret 9: p += pack('<Q', 0x0000000000401627) # pop rsi ; ret 10: p += pack('<Q', 0x00000000006ca088) # @ .data + 8 11: p += pack('<Q', 0x0000000000425e3f) # xor rax, rax ; ret 12: p += pack('<Q', 0x0000000000473f81) # mov qword ptr [rsi], rax ; ret 13: p+= pack('<Q', 0x00000000004784d6) # pop rax ; pop rdx ; pop rbx ; ret 14: p += p64(59) # execve syscall number 15: p += pack('<Q', 0x4141414141414141) # padding 16: p += pack('<Q', 0x4141414141414141) # padding 17: p += pack('<Q', 0x0000000000401506) # pop rdi ; ret 18: p += pack('<Q', 0x00000000006ca080) # @ .data 19: p += pack('<Q', 0x0000000000401627) # pop rsi ; ret 20: p += pack('<Q', 0x00000000006ca088) # @ .data + 8 21: p += pack('<Q', 0x0000000000442636) # pop rdx ; ret 22: p += pack('<Q', 0x00000000006ca088) # @ .data + 8 23: p += pack('<Q', 0x0000000000467175) # syscall ; ret Their instrumentation is impractical though as it increases the runtime by over 16x. My guess is that inserting zeroing instructions right before the "ret" instruction can disable some of the hacks we see with ROP: `pop rdi ; ret` becomes `pop rdi ; xor rdi, rdi ; ret` -bw
On Wed, Aug 12, 2020 at 02:44:59PM -0700, Bill Wendling wrote:> My guess is that inserting zeroing instructions right before the "ret" > instruction can disable some of the hacks we see with ROP: > > `pop rdi ; ret` becomes `pop rdi ; xor rdi, rdi ; ret`Right; this isn't meant to be a perfect defense. Nothing can be, really. But it narrows the opportunities available to an attacker (whether it be ROP, exposures, speculation, etc). The more deterministic the execution paths, the lower the chance that each given path is both useful (i.e. does work that helps an attacker) and available (i.e. can be "reached" through some specific bug) to an attacker. Given the near-zero cost (in both runtime and code size) of self-xor-ing registers, it's a "free" change that has a greater-than-zero cost to an attacker. -- Kees Cook
Bill Wendling via llvm-dev
2020-Aug-12 22:11 UTC
[llvm-dev] [RFC] Zeroing Caller Saved Regs
On Wed, Aug 12, 2020 at 2:59 PM Kees Cook <keescook at chromium.org> wrote:> > On Wed, Aug 12, 2020 at 02:44:59PM -0700, Bill Wendling wrote: > > My guess is that inserting zeroing instructions right before the "ret" > > instruction can disable some of the hacks we see with ROP: > > > > `pop rdi ; ret` becomes `pop rdi ; xor rdi, rdi ; ret` > > Right; this isn't meant to be a perfect defense. Nothing can be, really. > But it narrows the opportunities available to an attacker (whether it be > ROP, exposures, speculation, etc). The more deterministic the execution > paths, the lower the chance that each given path is both useful (i.e. > does work that helps an attacker) and available (i.e. can be "reached" > through some specific bug) to an attacker. > > Given the near-zero cost (in both runtime and code size) of self-xor-ing > registers, it's a "free" change that has a greater-than-zero cost to an > attacker. >I wanted to clarify that the 16x slowdown was in the authors' implementation, which used instrumentation to inject code. But yeah, this could help limit the avenues open to attackers. -bw
Stephen Checkoway via llvm-dev
2020-Aug-13 03:38 UTC
[llvm-dev] [RFC] Zeroing Caller Saved Regs
> On Aug 12, 2020, at 17:44, Bill Wendling via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > My guess is that inserting zeroing instructions right before the "ret" > instruction can disable some of the hacks we see with ROP: > > `pop rdi ; ret` becomes `pop rdi ; xor rdi, rdi ; ret`Three comments on this. 1. The very first ROP paper [1] used only unintended instruction sequences. That is, none of the return instructions were placed there by the compiler, they appeared completely within other instructions. 2. ROP doesn't require any return instructions [2]. It can be performed using call or jmp instructions. 3. As binaries get larger, the number of available instruction sequences from which one can build gadgets increases dramatically. If the goal is to make one system call like mprotect, you don't need very many at all. If want to get arbitrary computation using ROP and something like mprotect doesn't exist (e.g., on a Harvard architecture machine), you only need a few tens of kilobytes of code. I did it on the Z80 with 16 kB of code with a hardware interlock that forced instructions to be fetched from ROM [3]. There have been a bunch of defenses that purport to make attacks harder by decreasing the number of useful instruction sequences available to the attacker. They don't have a significant impact on attacks. That's not to say that this couldn't be useful, but I'm skeptical it would defend against ROP, or even make a ROP attack much more difficult. 1. https://hovav.net/ucsd/dist/geometry.pdf 2. https://checkoway.net/papers/noret_ccs2010/noret_ccs2010.pdf 3. https://checkoway.net/papers/evt2009/evt2009.pdf -- Stephen Checkoway
Bill Wendling via llvm-dev
2020-Aug-13 04:01 UTC
[llvm-dev] [RFC] Zeroing Caller Saved Regs
On Wed, Aug 12, 2020 at 8:38 PM Stephen Checkoway <s at pahtak.org> wrote:> > On Aug 12, 2020, at 17:44, Bill Wendling via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > > > My guess is that inserting zeroing instructions right before the "ret" > > instruction can disable some of the hacks we see with ROP: > > > > `pop rdi ; ret` becomes `pop rdi ; xor rdi, rdi ; ret` > > Three comments on this. > 1. The very first ROP paper [1] used only unintended instruction sequences. That is, none of the return instructions were placed there by the compiler, they appeared completely within other instructions. > 2. ROP doesn't require any return instructions [2]. It can be performed using call or jmp instructions.Sure, but the authors of the paper claim that it's incredibly difficult to have *only* COP / JOP gadgets. At some point you'll need to have an ROP gadget: "Usually, the gadgets of ROP end with a return instruction which we called conventional ROP attacks. Call Oriented Programming (COP) [8] and Jump-Oriented Programming (JOP) [9] are the variations of ROP attacks without returns [10]. The variations use gadgets that end with indirect call or jump instruction. However, performing ROP attacks without return instruction in reality is difficult for the reason that the gadgets of COP and JOP that can form a completed gadget chain are almost nonexistent. Actually, adversaries prefer to use combinational gadgets to evade current protection mechanisms."> 3. As binaries get larger, the number of available instruction sequences from which one can build gadgets increases dramatically. If the goal is to make one system call like mprotect, you don't need very many at all. If want to get arbitrary computation using ROP and something like mprotect doesn't exist (e.g., on a Harvard architecture machine), you only need a few tens of kilobytes of code. I did it on the Z80 with 16 kB of code with a hardware interlock that forced instructions to be fetched from ROM [3]. > > There have been a bunch of defenses that purport to make attacks harder by decreasing the number of useful instruction sequences available to the attacker. They don't have a significant impact on attacks. > > That's not to say that this couldn't be useful, but I'm skeptical it would defend against ROP, or even make a ROP attack much more difficult. >This is why having variable length instructions sucks. :-) I see your point. I was actually looking at the code we generate with the pop/xor if you start at different offsets in the code when your email came in. -bw