Joerg Roedel
2020-Jun-23 12:04 UTC
Should SEV-ES #VC use IST? (Re: [PATCH] Allow RDTSC and RDTSCP from userspace)
On Tue, Jun 23, 2020 at 01:48:18PM +0200, Peter Zijlstra wrote:> On Tue, Jun 23, 2020 at 01:30:07PM +0200, Joerg Roedel wrote:> But you cannot do a recursion check in #VC, because the NMI can happen > on the first instruction of #VC, before we can increment our counter, > and then the #VC can happen on NMI because the IST stack is a goner, and > we're fscked again (or on a per-cpu variable we touch in our elaborate > NMI setup, etc..).No, the recursion check is fine, because overwriting an already used IST stack doesn't matter (as long as it can be detected) if we are going to panic anyway. It doesn't matter because the kernel will not leave the currently running handler anymore. I agree there is no way to keep the system running if that happens, but that is also not what is wanted. If stack recursion happens, something malicious from the HV side is going on, and all the kernel needs to be able to is to safely and reliably detect the situation and panic the VM to prevent any data corruption or loss or even leakage.> I'll keep repeating this, x86_64 exceptions are a trainwreck, and IST in > specific is utter crap.I agree, but don't forget the most prominent underlying reason for IST: The SYSCALL gap. If SYSCALL would switch stacks most of those issues would not exist. IST would still be needed because there are no task gates in x86-64, but still... Regards, Joerg
Peter Zijlstra
2020-Jun-23 12:52 UTC
Should SEV-ES #VC use IST? (Re: [PATCH] Allow RDTSC and RDTSCP from userspace)
On Tue, Jun 23, 2020 at 02:04:33PM +0200, Joerg Roedel wrote:> On Tue, Jun 23, 2020 at 01:48:18PM +0200, Peter Zijlstra wrote: > > On Tue, Jun 23, 2020 at 01:30:07PM +0200, Joerg Roedel wrote: > > > But you cannot do a recursion check in #VC, because the NMI can happen > > on the first instruction of #VC, before we can increment our counter, > > and then the #VC can happen on NMI because the IST stack is a goner, and > > we're fscked again (or on a per-cpu variable we touch in our elaborate > > NMI setup, etc..). > > No, the recursion check is fine, because overwriting an already used IST > stack doesn't matter (as long as it can be detected) if we are going to > panic anyway. It doesn't matter because the kernel will not leave the > currently running handler anymore.You only have that guarantee when any SNP #VC from kernel is an automatic panic. But in that case, what's the point of having the recursion count?> > I'll keep repeating this, x86_64 exceptions are a trainwreck, and IST in > > specific is utter crap. > > I agree, but don't forget the most prominent underlying reason for IST: > The SYSCALL gap. If SYSCALL would switch stacks most of those issues > would not exist. IST would still be needed because there are no task > gates in x86-64, but still...We could all go back to int80 ;-) /me runs like heck
Joerg Roedel
2020-Jun-23 13:40 UTC
Should SEV-ES #VC use IST? (Re: [PATCH] Allow RDTSC and RDTSCP from userspace)
On Tue, Jun 23, 2020 at 02:52:01PM +0200, Peter Zijlstra wrote:> On Tue, Jun 23, 2020 at 02:04:33PM +0200, Joerg Roedel wrote: > > No, the recursion check is fine, because overwriting an already used IST > > stack doesn't matter (as long as it can be detected) if we are going to > > panic anyway. It doesn't matter because the kernel will not leave the > > currently running handler anymore. > > You only have that guarantee when any SNP #VC from kernel is an > automatic panic. But in that case, what's the point of having the > recursion count?It is not a recursion count, it is a stack-recursion check. Basically walk down the stack and look if your current stack is already in use. Yes, this can be optimized, but that is what is needed. IIRC the current prototype code for SNP just pre-validates all memory in the VM and doesn't support moving pages around on the host. So any #VC SNP exception would be fatal, yes. In a scenario with on-demand validation of guest pages and support for guest-assisted page-moving on the HV side it would be more complicated. Basically all memory that is accessed during #VC exception handling must stay validated at all times, including the IST stack. So saying this, I don't understand why _all_ SNP #VC exceptions from kernel space must be fatal? Regards, Joerg