Joerg Roedel
2021-Jun-08 09:54 UTC
[PATCH v3 0/7] x86/sev-es: Fixes for SEV-ES Guest Support
From: Joerg Roedel <jroedel at suse.de> Hi, here is the next revision of my pending fixes for SEV-ES guest support. Changes to the previous version are: - Removed the patches already merged - Added a new fix to map the EFI MOKVar table encrypted - Disabled IRQs when GHCB is active - Relaxed state tracking by using irqentry_enter()/exit instead of irqentry_nmi_enter()/exit() - Changed error reporting from insn_fetch_from_user*() as requested by Boris Changes are based on tip/x86/urgent. Please review. Thanks, Joerg Joerg Roedel (6): x86/sev-es: Fix error message in runtime #VC handler x86/sev-es: Disable IRQs while GHCB is active x86/sev-es: Run #VC handler in plain IRQ state x86/insn-eval: Make 0 a valid RIP for insn_get_effective_ip() x86/insn: Extend error reporting from insn_fetch_from_user[_inatomic]() x86/sev-es: Propagate #GP if getting linear instruction address failed Tom Lendacky (1): x86/ioremap: Map efi_mem_reserve() memory as encrypted for SEV arch/x86/kernel/sev.c | 61 +++++++++++++++++++++++++--------------- arch/x86/kernel/umip.c | 10 +++---- arch/x86/lib/insn-eval.c | 22 +++++++++------ arch/x86/mm/ioremap.c | 4 ++- 4 files changed, 59 insertions(+), 38 deletions(-) base-commit: 009767dbf42ac0dbe3cf48c1ee224f6b778aa85a -- 2.31.1
Joerg Roedel
2021-Jun-08 09:54 UTC
[PATCH v3 1/7] x86/ioremap: Map efi_mem_reserve() memory as encrypted for SEV
From: Tom Lendacky <thomas.lendacky at amd.com> Some drivers require memory that is marked as EFI boot services data. So that this memory is not re-used by the kernel after ExitBootServices(), efi_mem_reserve() is used to preserve it by inserting a new EFI memory descriptor and marking it with the EFI_MEMORY_RUNTIME attribute. Under SEV, memory marked with the EFI_MEMORY_RUNTIME attribute needs to be mapped encrypted by Linux, otherwise the kernel might crash at boot like below: EFI Variables Facility v0.08 2004-May-17 general protection fault, probably for non-canonical address 0x3597688770a868b2: 0000 [#1] SMP NOPTI CPU: 13 PID: 1 Comm: swapper/0 Not tainted 5.12.4-2-default #1 openSUSE Tumbleweed Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:efi_mokvar_entry_next+0x34/0x40 Code: c5 01 48 8b 17 48 c7 07 00 00 00 00 48 85 c0 74 24 48 85 d2 74 14 80 3a 00 74 18 48 8b 82 00 01 00 00 48 8d 84 02 08 01 00 00 <80> 38 00 74 04 48 89 07 c3 31 c0 c3 0f 1f 44 00 00 41 54 4c 8b 25 [...] Call Trace: efi_mokvar_sysfs_init ? efi_mokvar_table_init do_one_initcall ? __kmalloc kernel_init_freeable ? rest_init kernel_init ret_from_fork Modules linked in: ---[ end trace 0de27ecc25d41b73 ]--- Expand the __ioremap_check_other() function to additionally check for this other type of "runtime" data and indicate that it should be mapped encrypted for an SEV guest. Fixes: 58c909022a5a ("efi: Support for MOK variable config table") Reported-by: Joerg Roedel <jroedel at suse.de> Tested-by: Joerg Roedel <jroedel at suse.de> Signed-off-by: Tom Lendacky <thomas.lendacky at amd.com> Signed-off-by: Joerg Roedel <jroedel at suse.de> --- arch/x86/mm/ioremap.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 12c686c65ea9..60ade7dd71bd 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -118,7 +118,9 @@ static void __ioremap_check_other(resource_size_t addr, struct ioremap_desc *des if (!IS_ENABLED(CONFIG_EFI)) return; - if (efi_mem_type(addr) == EFI_RUNTIME_SERVICES_DATA) + if (efi_mem_type(addr) == EFI_RUNTIME_SERVICES_DATA || + (efi_mem_type(addr) == EFI_BOOT_SERVICES_DATA && + efi_mem_attributes(addr) & EFI_MEMORY_RUNTIME)) desc->flags |= IORES_MAP_ENCRYPTED; } -- 2.31.1
Joerg Roedel
2021-Jun-08 09:54 UTC
[PATCH v3 2/7] x86/sev-es: Fix error message in runtime #VC handler
From: Joerg Roedel <jroedel at suse.de> The runtime #VC handler is not "early" anymore. Fix the copy&paste error and remove that word from the error message. Signed-off-by: Joerg Roedel <jroedel at suse.de> --- arch/x86/kernel/sev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index 651b81cd648e..4fd997bbf059 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -1369,7 +1369,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication) vc_finish_insn(&ctxt); break; case ES_UNSUPPORTED: - pr_err_ratelimited("Unsupported exit-code 0x%02lx in early #VC exception (IP: 0x%lx)\n", + pr_err_ratelimited("Unsupported exit-code 0x%02lx in #VC exception (IP: 0x%lx)\n", error_code, regs->ip); goto fail; case ES_VMM_ERROR: -- 2.31.1
Joerg Roedel
2021-Jun-08 09:54 UTC
[PATCH v3 3/7] x86/sev-es: Disable IRQs while GHCB is active
From: Joerg Roedel <jroedel at suse.de> The #VC handler only cares about IRQs being disabled while the GHCB is active, as it must not be interrupted by something which could cause another #VC while it holds the GHCB (NMI is the exception for which the backup GHCB is there). Make sure nothing interrupts the code path while the GHCB is active by disabling IRQs in sev_es_get_ghcb() and restoring the previous irq state in sev_es_put_ghcb(). Signed-off-by: Joerg Roedel <jroedel at suse.de> --- arch/x86/kernel/sev.c | 39 +++++++++++++++++++++++++-------------- 1 file changed, 25 insertions(+), 14 deletions(-) diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index 4fd997bbf059..2a922d1b03c8 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -192,14 +192,23 @@ void noinstr __sev_es_ist_exit(void) this_cpu_write(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC], *(unsigned long *)ist); } -static __always_inline struct ghcb *sev_es_get_ghcb(struct ghcb_state *state) +static __always_inline struct ghcb *sev_es_get_ghcb(struct ghcb_state *state, + unsigned long *flags) { struct sev_es_runtime_data *data; struct ghcb *ghcb; + /* + * Nothing shall interrupt this code path while holding the per-cpu + * GHCB. The backup GHCB is only for NMIs interrupting this path. + */ + local_irq_save(*flags); + data = this_cpu_read(runtime_data); ghcb = &data->ghcb_page; + + if (unlikely(data->ghcb_active)) { /* GHCB is already in use - save its contents */ @@ -479,7 +488,8 @@ static enum es_result vc_slow_virt_to_phys(struct ghcb *ghcb, struct es_em_ctxt /* Include code shared with pre-decompression boot stage */ #include "sev-shared.c" -static __always_inline void sev_es_put_ghcb(struct ghcb_state *state) +static __always_inline void sev_es_put_ghcb(struct ghcb_state *state, + unsigned long flags) { struct sev_es_runtime_data *data; struct ghcb *ghcb; @@ -500,14 +510,17 @@ static __always_inline void sev_es_put_ghcb(struct ghcb_state *state) vc_ghcb_invalidate(ghcb); data->ghcb_active = false; } + + local_irq_restore(flags); } void noinstr __sev_es_nmi_complete(void) { struct ghcb_state state; + unsigned long flags; struct ghcb *ghcb; - ghcb = sev_es_get_ghcb(&state); + ghcb = sev_es_get_ghcb(&state, &flags); vc_ghcb_invalidate(ghcb); ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_NMI_COMPLETE); @@ -517,7 +530,7 @@ void noinstr __sev_es_nmi_complete(void) sev_es_wr_ghcb_msr(__pa_nodebug(ghcb)); VMGEXIT(); - sev_es_put_ghcb(&state); + sev_es_put_ghcb(&state, flags); } static u64 get_jump_table_addr(void) @@ -527,9 +540,7 @@ static u64 get_jump_table_addr(void) struct ghcb *ghcb; u64 ret = 0; - local_irq_save(flags); - - ghcb = sev_es_get_ghcb(&state); + ghcb = sev_es_get_ghcb(&state, &flags); vc_ghcb_invalidate(ghcb); ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_AP_JUMP_TABLE); @@ -543,9 +554,7 @@ static u64 get_jump_table_addr(void) ghcb_sw_exit_info_2_is_valid(ghcb)) ret = ghcb->save.sw_exit_info_2; - sev_es_put_ghcb(&state); - - local_irq_restore(flags); + sev_es_put_ghcb(&state, flags); return ret; } @@ -666,9 +675,10 @@ static bool __init sev_es_setup_ghcb(void) static void sev_es_ap_hlt_loop(void) { struct ghcb_state state; + unsigned long flags; struct ghcb *ghcb; - ghcb = sev_es_get_ghcb(&state); + ghcb = sev_es_get_ghcb(&state, &flags); while (true) { vc_ghcb_invalidate(ghcb); @@ -685,7 +695,7 @@ static void sev_es_ap_hlt_loop(void) break; } - sev_es_put_ghcb(&state); + sev_es_put_ghcb(&state, flags); } /* @@ -1333,6 +1343,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication) struct ghcb_state state; struct es_em_ctxt ctxt; enum es_result result; + unsigned long flags; struct ghcb *ghcb; /* @@ -1353,7 +1364,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication) * keep the IRQs disabled to protect us against concurrent TLB flushes. */ - ghcb = sev_es_get_ghcb(&state); + ghcb = sev_es_get_ghcb(&state, &flags); vc_ghcb_invalidate(ghcb); result = vc_init_em_ctxt(&ctxt, regs, error_code); @@ -1361,7 +1372,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication) if (result == ES_OK) result = vc_handle_exitcode(&ctxt, ghcb, error_code); - sev_es_put_ghcb(&state); + sev_es_put_ghcb(&state, flags); /* Done - now check the result */ switch (result) { -- 2.31.1
Joerg Roedel
2021-Jun-08 09:54 UTC
[PATCH v3 4/7] x86/sev-es: Run #VC handler in plain IRQ state
From: Joerg Roedel <jroedel at suse.de> Use irqentry_enter() and irqentry_exit() to track the runtime state of the #VC handler. The reason it ran in NMI mode was solely to make sure nothing interrupts the handler while the GHCB is in use. This is handled now in sev_es_get/put_ghcb() directly, so there is no reason the #VC handler can not run in normal IRQ mode and enjoy the benefits like being able to send signals. Fixes: 62441a1fb532 ("x86/sev-es: Correctly track IRQ states in runtime #VC handler") Signed-off-by: Joerg Roedel <jroedel at suse.de> --- arch/x86/kernel/sev.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index 2a922d1b03c8..b563fb747aed 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -1354,8 +1354,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication) return; } - irq_state = irqentry_nmi_enter(regs); - lockdep_assert_irqs_disabled(); + irq_state = irqentry_enter(regs); instrumentation_begin(); /* @@ -1408,7 +1407,7 @@ DEFINE_IDTENTRY_VC_SAFE_STACK(exc_vmm_communication) out: instrumentation_end(); - irqentry_nmi_exit(regs, irq_state); + irqentry_exit(regs, irq_state); return; -- 2.31.1
Joerg Roedel
2021-Jun-08 09:54 UTC
[PATCH v3 5/7] x86/insn-eval: Make 0 a valid RIP for insn_get_effective_ip()
From: Joerg Roedel <jroedel at suse.de> In theory 0 is a valid value for the instruction pointer, so don't use it as the error return value from insn_get_effective_ip(). Signed-off-by: Joerg Roedel <jroedel at suse.de> --- arch/x86/lib/insn-eval.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index a67afd74232c..4eecb9c7c6a0 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -1417,7 +1417,7 @@ void __user *insn_get_addr_ref(struct insn *insn, struct pt_regs *regs) } } -static unsigned long insn_get_effective_ip(struct pt_regs *regs) +static int insn_get_effective_ip(struct pt_regs *regs, unsigned long *ip) { unsigned long seg_base = 0; @@ -1430,10 +1430,12 @@ static unsigned long insn_get_effective_ip(struct pt_regs *regs) if (!user_64bit_mode(regs)) { seg_base = insn_get_seg_base(regs, INAT_SEG_REG_CS); if (seg_base == -1L) - return 0; + return -EINVAL; } - return seg_base + regs->ip; + *ip = seg_base + regs->ip; + + return 0; } /** @@ -1455,8 +1457,7 @@ int insn_fetch_from_user(struct pt_regs *regs, unsigned char buf[MAX_INSN_SIZE]) unsigned long ip; int not_copied; - ip = insn_get_effective_ip(regs); - if (!ip) + if (insn_get_effective_ip(regs, &ip)) return 0; not_copied = copy_from_user(buf, (void __user *)ip, MAX_INSN_SIZE); @@ -1484,8 +1485,7 @@ int insn_fetch_from_user_inatomic(struct pt_regs *regs, unsigned char buf[MAX_IN unsigned long ip; int not_copied; - ip = insn_get_effective_ip(regs); - if (!ip) + if (insn_get_effective_ip(regs, &ip)) return 0; not_copied = __copy_from_user_inatomic(buf, (void __user *)ip, MAX_INSN_SIZE); -- 2.31.1
Joerg Roedel
2021-Jun-08 09:54 UTC
[PATCH v3 6/7] x86/insn: Extend error reporting from insn_fetch_from_user[_inatomic]()
From: Joerg Roedel <jroedel at suse.de> The error reporting from the insn_fetch_from_user*() functions is not very verbose. Extend it to include information on whether the linear RIP could not be calculated or whether the memory access faulted. This will be used in the SEV-ES code to propagate the correct exception depending on what went wrong during instruction fetch. Signed-off-by: Joerg Roedel <jroedel at suse.de> --- arch/x86/kernel/sev.c | 8 ++++---- arch/x86/kernel/umip.c | 10 ++++------ arch/x86/lib/insn-eval.c | 8 ++++++-- 3 files changed, 14 insertions(+), 12 deletions(-) diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index b563fb747aed..2b499affb2fb 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -267,17 +267,17 @@ static int vc_fetch_insn_kernel(struct es_em_ctxt *ctxt, static enum es_result __vc_decode_user_insn(struct es_em_ctxt *ctxt) { char buffer[MAX_INSN_SIZE]; - int res; + int insn_bytes; - res = insn_fetch_from_user_inatomic(ctxt->regs, buffer); - if (!res) { + insn_bytes = insn_fetch_from_user_inatomic(ctxt->regs, buffer); + if (insn_bytes <= 0) { ctxt->fi.vector = X86_TRAP_PF; ctxt->fi.error_code = X86_PF_INSTR | X86_PF_USER; ctxt->fi.cr2 = ctxt->regs->ip; return ES_EXCEPTION; } - if (!insn_decode_from_regs(&ctxt->insn, ctxt->regs, buffer, res)) + if (!insn_decode_from_regs(&ctxt->insn, ctxt->regs, buffer, insn_bytes)) return ES_DECODE_FAILED; if (ctxt->insn.immediate.got) diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c index 8daa70b0d2da..337178809c89 100644 --- a/arch/x86/kernel/umip.c +++ b/arch/x86/kernel/umip.c @@ -346,14 +346,12 @@ bool fixup_umip_exception(struct pt_regs *regs) if (!regs) return false; - nr_copied = insn_fetch_from_user(regs, buf); - /* - * The insn_fetch_from_user above could have failed if user code - * is protected by a memory protection key. Give up on emulation - * in such a case. Should we issue a page fault? + * Give up on emulation if fetching the instruction failed. Should we + * issue a page fault or a #GP? */ - if (!nr_copied) + nr_copied = insn_fetch_from_user(regs, buf); + if (nr_copied <= 0) return false; if (!insn_decode_from_regs(&insn, regs, buf, nr_copied)) diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 4eecb9c7c6a0..1b5cdf8b7a4e 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -1451,6 +1451,8 @@ static int insn_get_effective_ip(struct pt_regs *regs, unsigned long *ip) * Number of instruction bytes copied. * * 0 if nothing was copied. + * + * -EINVAL if the linear address of the instruction could not be calculated */ int insn_fetch_from_user(struct pt_regs *regs, unsigned char buf[MAX_INSN_SIZE]) { @@ -1458,7 +1460,7 @@ int insn_fetch_from_user(struct pt_regs *regs, unsigned char buf[MAX_INSN_SIZE]) int not_copied; if (insn_get_effective_ip(regs, &ip)) - return 0; + return -EINVAL; not_copied = copy_from_user(buf, (void __user *)ip, MAX_INSN_SIZE); @@ -1479,6 +1481,8 @@ int insn_fetch_from_user(struct pt_regs *regs, unsigned char buf[MAX_INSN_SIZE]) * Number of instruction bytes copied. * * 0 if nothing was copied. + * + * -EINVAL if the linear address of the instruction could not be calculated */ int insn_fetch_from_user_inatomic(struct pt_regs *regs, unsigned char buf[MAX_INSN_SIZE]) { @@ -1486,7 +1490,7 @@ int insn_fetch_from_user_inatomic(struct pt_regs *regs, unsigned char buf[MAX_IN int not_copied; if (insn_get_effective_ip(regs, &ip)) - return 0; + return -EINVAL; not_copied = __copy_from_user_inatomic(buf, (void __user *)ip, MAX_INSN_SIZE); -- 2.31.1
Joerg Roedel
2021-Jun-08 09:54 UTC
[PATCH v3 7/7] x86/sev-es: Propagate #GP if getting linear instruction address failed
From: Joerg Roedel <jroedel at suse.de> When an instruction is fetched from user-space, segmentation needs to be taken into account. This means that getting the linear address of an instruction can fail. Hardware would raise a #GP exception in that case, but the #VC exception handler would emulate it as a page-fault. The insn_fetch_from_user*() functions now provide the relevant information in case of an failure. Use that and propagate a #GP when the linear address of an instruction to fetch could not be calculated. Signed-off-by: Joerg Roedel <jroedel at suse.de> --- arch/x86/kernel/sev.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c index 2b499affb2fb..737d7198aab1 100644 --- a/arch/x86/kernel/sev.c +++ b/arch/x86/kernel/sev.c @@ -270,11 +270,18 @@ static enum es_result __vc_decode_user_insn(struct es_em_ctxt *ctxt) int insn_bytes; insn_bytes = insn_fetch_from_user_inatomic(ctxt->regs, buffer); - if (insn_bytes <= 0) { + if (insn_bytes == 0) { + /* Nothing could be copied */ ctxt->fi.vector = X86_TRAP_PF; ctxt->fi.error_code = X86_PF_INSTR | X86_PF_USER; ctxt->fi.cr2 = ctxt->regs->ip; return ES_EXCEPTION; + } else if (insn_bytes == -EINVAL) { + /* Effective RIP could not be calculated */ + ctxt->fi.vector = X86_TRAP_GP; + ctxt->fi.error_code = 0; + ctxt->fi.cr2 = 0; + return ES_EXCEPTION; } if (!insn_decode_from_regs(&ctxt->insn, ctxt->regs, buffer, insn_bytes)) -- 2.31.1