From: Don Slutz <dslutz@verizon.com> Without this xenctx can return incorrect register values. Also xc_translate_foreign_address might do the wrong thing. Patch #1 is unit test code. Patch #2 is quick and dirty linux kernel module to offline some cpus Patch #3 fixes the offline VCPU handling. Patch #4 fixes hvm_save_one to be able to return PIC instance 1. State: # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 2048 8 r----- 58.0 P-8-0 1 3080 8 r----- 27.1 # xl list Name ID Mem VCPUs State Time(s) Domain-0 0 2048 8 r----- 66.8 P-8-0 1 3080 1 ------ 45.0 Before: # ./check-hvmctx 1 Check HVM save record vs partial for domain 1 Error: entry 1: type 2 instance 2, length 1024 mismatch! CPU: rax 0x0000000000000000 rbx 0xffffffff8006b287 (And PIC issue:) # ./check-hvmctx 1 Check HVM save record vs partial for domain 1 Error: entry 1: type 2 instance 2, length 1024 mismatch! CPU: rax 0x0000000000000000 rbx 0xffffffff8006b287 Error: xc_domain_hvm_getcontext_partial: entry 3: type 3 instance 1, length 8: Invalid argument Error: entry 3: type 3 instance 1, length 8 mismatch! PIC: IRQ base 0x28, irr 0x11, imr 0xff, isr 0 After: # ./check-hvmctx 1 Check HVM save record vs partial for domain 1 Don Slutz (4): tools/test: Add check-hvmctx Add tools/tests/offline_module hvm_save_one: return correct data. hvm_save_one: allow the 2nd instance to be fetched for PIC. tools/tests/check-hvmctx/.gitignore | 1 + tools/tests/check-hvmctx/Makefile | 34 ++ tools/tests/check-hvmctx/README | 21 + tools/tests/check-hvmctx/check-hvmctx.c | 661 ++++++++++++++++++++++++++++++++ tools/tests/offline_module/.gitignore | 8 + tools/tests/offline_module/Makefile | 10 + tools/tests/offline_module/README | 37 ++ tools/tests/offline_module/offline.c | 89 +++++ xen/common/hvm/save.c | 31 +- 9 files changed, 882 insertions(+), 10 deletions(-) create mode 100644 tools/tests/check-hvmctx/.gitignore create mode 100644 tools/tests/check-hvmctx/Makefile create mode 100644 tools/tests/check-hvmctx/README create mode 100644 tools/tests/check-hvmctx/check-hvmctx.c create mode 100644 tools/tests/offline_module/.gitignore create mode 100644 tools/tests/offline_module/Makefile create mode 100644 tools/tests/offline_module/README create mode 100644 tools/tests/offline_module/offline.c -- 1.8.4
From: Don Slutz <dslutz@verizon.com> This is a unit test for xc_domain_hvm_getcontext_partial vs xc_domain_hvm_getcontext. Signed-off-by: Don Slutz <dslutz@verizon.com> --- tools/tests/check-hvmctx/.gitignore | 1 + tools/tests/check-hvmctx/Makefile | 34 ++ tools/tests/check-hvmctx/README | 21 + tools/tests/check-hvmctx/check-hvmctx.c | 661 ++++++++++++++++++++++++++++++++ 4 files changed, 717 insertions(+) create mode 100644 tools/tests/check-hvmctx/.gitignore create mode 100644 tools/tests/check-hvmctx/Makefile create mode 100644 tools/tests/check-hvmctx/README create mode 100644 tools/tests/check-hvmctx/check-hvmctx.c diff --git a/tools/tests/check-hvmctx/.gitignore b/tools/tests/check-hvmctx/.gitignore new file mode 100644 index 0000000..e2dabd8 --- /dev/null +++ b/tools/tests/check-hvmctx/.gitignore @@ -0,0 +1 @@ +check-hvmctx diff --git a/tools/tests/check-hvmctx/Makefile b/tools/tests/check-hvmctx/Makefile new file mode 100644 index 0000000..dce7508 --- /dev/null +++ b/tools/tests/check-hvmctx/Makefile @@ -0,0 +1,34 @@ +XEN_ROOT=$(CURDIR)/../../.. +include $(XEN_ROOT)/tools/Rules.mk + +CFLAGS += -Werror + +CFLAGS += $(CFLAGS_libxenctrl) +CFLAGS += $(CFLAGS_xeninclude) +CFLAGS += $(CFLAGS_libxenstore) + +HDRS = $(wildcard *.h) + +TARGETS-$(CONFIG_X86) += check-hvmctx +TARGETS := $(TARGETS-y) + +# Include configure output (config.h) to headers search path +CFLAGS += -I$(XEN_ROOT)/tools + +# Allow special builds +CFLAGS += $(CHECK_HVMCTX) + +.PHONY: all +all: build + +.PHONY: build +build: $(TARGETS) Makefile + +.PHONY: clean +clean: + $(RM) *.o $(TARGETS) *~ $(DEPS) + +check-hvmctx: check-hvmctx.o Makefile + $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS) + +-include $(DEPS) diff --git a/tools/tests/check-hvmctx/README b/tools/tests/check-hvmctx/README new file mode 100644 index 0000000..0d70df4 --- /dev/null +++ b/tools/tests/check-hvmctx/README @@ -0,0 +1,21 @@ +Check that xc_domain_hvm_getcontext_partial gets the same data as +xc_domain_hvm_getcontext. + +To see the bug about offline VCPU(s) you can use the offline_module +(offline.ko) in a domU. + +To see the bug about PIC: + +make clean;make CHECK_HVMCTX="-DDOPIC" + +To never report a time based issue (I see it about 1 out of 50): + +make clean;make CHECK_HVMCTX="-DFULLCLEAN" + +To see time based issues: + +make clean;make CHECK_HVMCTX="-DNOTCLEAN" + +Normal: + +make clean;make diff --git a/tools/tests/check-hvmctx/check-hvmctx.c b/tools/tests/check-hvmctx/check-hvmctx.c new file mode 100644 index 0000000..db331f9 --- /dev/null +++ b/tools/tests/check-hvmctx/check-hvmctx.c @@ -0,0 +1,661 @@ +/* + * check-hvmctx.c + * + * check that xc_domain_hvm_getcontext_partial gets the same data as xc_domain_hvm_getcontext. + * + * Based on xen-hvmctx.c + * + * Tim Deegan <Tim.Deegan@citrix.com> + * Copyright (c) 2008 Citrix Systems, Inc. + * Copyright (C) 2013 by Verizon. + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ + +#include <inttypes.h> +#include <stdio.h> +#include <stdlib.h> +#include <stddef.h> +#include <stdint.h> +#include <unistd.h> +#include <string.h> +#include <errno.h> +#include <limits.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <arpa/inet.h> + +#define BITS_PER_LONG __WORDSIZE +#define BITS_TO_LONGS(bits) \ + (((bits)+BITS_PER_LONG-1)/BITS_PER_LONG) +#define DECLARE_BITMAP(name,bits) \ + unsigned long name[BITS_TO_LONGS(bits)] + +#include <xenctrl.h> +#include <xen/xen.h> +#include <xen/domctl.h> +#include <xen/hvm/save.h> + +static uint8_t *buf = NULL; +static uint32_t len; +static uint32_t off; +static int debug = 0; + +#define READ(_x) do { \ + if ( len - off < sizeof (_x) ) \ + { \ + fprintf(stderr, \ + "Error: need another %u bytes, only %u available", \ + (unsigned int)sizeof(_x), len - off); \ + exit(1); \ + } \ + memcpy(&(_x), buf + off, sizeof (_x)); \ + off += sizeof (_x); \ + } while (0) + +static void vms_dump(const void *d, int size) +{ + int len, i, j, c; + const unsigned char *buf = d; + + for(i = 0; i < size; i += 16) { + len = size - i; + if (len > 16) + len = 16; + for(j = len; j < 16; j++) { + if ((j != 15) && ((j & 3) == 3)) + printf(" "); + printf(" "); + } + for(j = len - 1; j >= 0; j--) { + if ((j != 15) && ((j & 3) == 3)) + printf(" "); + printf("%02x", buf[i+j]); + } + printf(" %08x: ", i); + for(j = 0; j < len; j++) { + c = buf[i+j]; + if (c < '' '' || c > ''~'') + c = ''.''; + printf("%c", c); + } + printf("\n"); + } +} + +static void dump_header(void) +{ + HVM_SAVE_TYPE(HEADER) h; + READ(h); + printf(" Header: magic %#lx, version %lu\n", + (unsigned long) h.magic, (unsigned long) h.version); + if (debug & 0x02) { + printf(" Xen changeset %llx\n", + (unsigned long long) h.changeset); + printf(" CPUID[0][%%eax] 0x%.8lx\n", (unsigned long) h.cpuid); + printf(" gtsc_khz %lu\n", (unsigned long) h.gtsc_khz); + } +} + +struct fpu_mm { + uint64_t lo; + uint16_t hi; + uint16_t pad[3]; +} __attribute__((packed)); + +struct fpu_xmm { + uint64_t lo; + uint64_t hi; +}; + +struct fpu_regs { + uint16_t fcw; + uint16_t fsw; + uint8_t ftw; + uint8_t res0; + uint16_t fop; + uint64_t fpuip; + uint64_t fpudp; + uint32_t mxcsr; + uint32_t mxcsr_mask; + struct fpu_mm mm[8]; + struct fpu_xmm xmm[16]; + uint64_t res1[12]; +} __attribute__((packed)); + +static void dump_fpu(void *p) +{ + struct fpu_regs *r = p; + int i; + + printf(" FPU: fcw 0x%4.4x fsw 0x%4.4x\n", + (unsigned)r->fcw, (unsigned)r->fsw); + if (debug & 0x02) { + printf(" ftw 0x%2.2x (0x%2.2x) fop 0x%4.4x\n" + " fpuip 0x%16.16"PRIx64" fpudp 0x%16.16"PRIx64"\n" + " mxcsr 0x%8.8lx mask 0x%8.8lx\n", + (unsigned)r->ftw, (unsigned)r->res0, (unsigned)r->fop, + r->fpuip, r->fpudp, + (unsigned long)r->mxcsr, (unsigned long)r->mxcsr_mask); + + for ( i = 0 ; i < 8 ; i++ ) + printf(" mm%i 0x%4.4x%16.16"PRIx64" (0x%4.4x%4.4x%4.4x)\n", + i, r->mm[i].hi, r->mm[i].lo, + r->mm[i].pad[2], r->mm[i].pad[1], r->mm[i].pad[0]); + + for ( i = 0 ; i < 16 ; i++ ) + printf(" xmm%2.2i 0x%16.16"PRIx64"%16.16"PRIx64"\n", + i, r->xmm[i].hi, r->xmm[i].lo); + + for ( i = 0 ; i < 6 ; i++ ) + printf(" (0x%16.16"PRIx64"%16.16"PRIx64")\n", + r->res1[2*i+1], r->res1[2*i]); + } +} + +static void dump_cpu(void) +{ + HVM_SAVE_TYPE(CPU) c; + READ(c); + printf(" CPU: rax 0x%16.16llx rbx 0x%16.16llx\n", + (unsigned long long) c.rax, (unsigned long long) c.rbx); + if (debug & 0x02) { + printf(" rcx 0x%16.16llx rdx 0x%16.16llx\n" + " rbp 0x%16.16llx rsi 0x%16.16llx\n" + " rdi 0x%16.16llx rsp 0x%16.16llx\n" + " r8 0x%16.16llx r9 0x%16.16llx\n" + " r10 0x%16.16llx r11 0x%16.16llx\n" + " r12 0x%16.16llx r13 0x%16.16llx\n" + " r14 0x%16.16llx r15 0x%16.16llx\n" + " rip 0x%16.16llx rflags 0x%16.16llx\n" + " cr0 0x%16.16llx cr2 0x%16.16llx\n" + " cr3 0x%16.16llx cr4 0x%16.16llx\n" + " dr0 0x%16.16llx dr1 0x%16.16llx\n" + " dr2 0x%16.16llx dr3 0x%16.16llx\n" + " dr6 0x%16.16llx dr7 0x%16.16llx\n" + " cs 0x%8.8x (0x%16.16llx + 0x%8.8x / 0x%5.5x)\n" + " ds 0x%8.8x (0x%16.16llx + 0x%8.8x / 0x%5.5x)\n" + " es 0x%8.8x (0x%16.16llx + 0x%8.8x / 0x%5.5x)\n" + " fs 0x%8.8x (0x%16.16llx + 0x%8.8x / 0x%5.5x)\n" + " gs 0x%8.8x (0x%16.16llx + 0x%8.8x / 0x%5.5x)\n" + " ss 0x%8.8x (0x%16.16llx + 0x%8.8x / 0x%5.5x)\n" + " tr 0x%8.8x (0x%16.16llx + 0x%8.8x / 0x%5.5x)\n" + " ldtr 0x%8.8x (0x%16.16llx + 0x%8.8x / 0x%5.5x)\n" + " idtr (0x%16.16llx + 0x%8.8x)\n" + " gdtr (0x%16.16llx + 0x%8.8x)\n" + " sysenter cs 0x%8.8llx eip 0x%16.16llx esp 0x%16.16llx\n" + " shadow gs 0x%16.16llx\n" + " MSR flags 0x%16.16llx lstar 0x%16.16llx\n" + " star 0x%16.16llx cstar 0x%16.16llx\n" + " sfmask 0x%16.16llx efer 0x%16.16llx\n" + " tsc 0x%16.16llx\n" + " event 0x%8.8lx error 0x%8.8lx\n", + (unsigned long long) c.rcx, (unsigned long long) c.rdx, + (unsigned long long) c.rbp, (unsigned long long) c.rsi, + (unsigned long long) c.rdi, (unsigned long long) c.rsp, + (unsigned long long) c.r8, (unsigned long long) c.r9, + (unsigned long long) c.r10, (unsigned long long) c.r11, + (unsigned long long) c.r12, (unsigned long long) c.r13, + (unsigned long long) c.r14, (unsigned long long) c.r15, + (unsigned long long) c.rip, (unsigned long long) c.rflags, + (unsigned long long) c.cr0, (unsigned long long) c.cr2, + (unsigned long long) c.cr3, (unsigned long long) c.cr4, + (unsigned long long) c.dr0, (unsigned long long) c.dr1, + (unsigned long long) c.dr2, (unsigned long long) c.dr3, + (unsigned long long) c.dr6, (unsigned long long) c.dr7, + c.cs_sel, (unsigned long long) c.cs_base, c.cs_limit, c.cs_arbytes, + c.ds_sel, (unsigned long long) c.ds_base, c.ds_limit, c.ds_arbytes, + c.es_sel, (unsigned long long) c.es_base, c.es_limit, c.es_arbytes, + c.fs_sel, (unsigned long long) c.fs_base, c.fs_limit, c.fs_arbytes, + c.gs_sel, (unsigned long long) c.gs_base, c.gs_limit, c.gs_arbytes, + c.ss_sel, (unsigned long long) c.ss_base, c.ss_limit, c.ss_arbytes, + c.tr_sel, (unsigned long long) c.tr_base, c.tr_limit, c.tr_arbytes, + c.ldtr_sel, (unsigned long long) c.ldtr_base, + c.ldtr_limit, c.ldtr_arbytes, + (unsigned long long) c.idtr_base, c.idtr_limit, + (unsigned long long) c.gdtr_base, c.gdtr_limit, + (unsigned long long) c.sysenter_cs, + (unsigned long long) c.sysenter_eip, + (unsigned long long) c.sysenter_esp, + (unsigned long long) c.shadow_gs, + (unsigned long long) c.msr_flags, + (unsigned long long) c.msr_lstar, + (unsigned long long) c.msr_star, + (unsigned long long) c.msr_cstar, + (unsigned long long) c.msr_syscall_mask, + (unsigned long long) c.msr_efer, + (unsigned long long) c.tsc, + (unsigned long) c.pending_event, (unsigned long) c.error_code); + dump_fpu(&c.fpu_regs); + } +} + + +static void dump_pic(void) +{ + HVM_SAVE_TYPE(PIC) p; + READ(p); + printf(" PIC: IRQ base %#x, irr %#x, imr %#x, isr %#x\n", + p.irq_base, p.irr, p.imr, p.isr); + if (debug & 0x02) { + + printf(" init_state %u, priority_add %u, readsel_isr %u, poll %u\n", + p.init_state, p.priority_add, p.readsel_isr, p.poll); + printf(" auto_eoi %u, rotate_on_auto_eoi %u\n", + p.auto_eoi, p.rotate_on_auto_eoi); + printf(" special_fully_nested_mode %u, special_mask_mode %u\n", + p.special_fully_nested_mode, p.special_mask_mode); + printf(" is_master %u, elcr %#x, int_output %#x\n", + p.is_master, p.elcr, p.int_output); + } +} + + +static void dump_ioapic(void) +{ + int i; + HVM_SAVE_TYPE(IOAPIC) p; + READ(p); + printf(" IOAPIC: base_address %#llx, ioregsel %#x id %#x\n", + (unsigned long long) p.base_address, p.ioregsel, p.id); + if (debug & 0x02) { + for ( i = 0; i < VIOAPIC_NUM_PINS; i++ ) + { + printf(" pin %.2i: 0x%.16llx\n", i, + (unsigned long long) p.redirtbl[i].bits); + } + } +} + +static void dump_lapic(void) +{ + HVM_SAVE_TYPE(LAPIC) p; + READ(p); + printf(" LAPIC: base_msr %#llx, disabled %#x, timer_divisor %#x\n", + (unsigned long long) p.apic_base_msr, p.disabled, p.timer_divisor); +} + +static void dump_lapic_regs(void) +{ + unsigned int i; + HVM_SAVE_TYPE(LAPIC_REGS) r; + READ(r); + printf(" LAPIC registers:\n"); + if (debug & 0x02) { + for ( i = 0 ; i < 0x400 ; i += 32 ) + { + printf(" 0x%4.4x: 0x%16.16llx 0x%4.4x: 0x%16.16llx\n", + i, *(unsigned long long *)&r.data[i], + i + 16, *(unsigned long long *)&r.data[i + 16]); + } + } +} + +static void dump_pci_irq(void) +{ + HVM_SAVE_TYPE(PCI_IRQ) i; + READ(i); + printf(" PCI IRQs: 0x%16.16llx%16.16llx\n", + (unsigned long long) i.pad[0], (unsigned long long) i.pad[1]); +} + +static void dump_isa_irq(void) +{ + HVM_SAVE_TYPE(ISA_IRQ) i; + READ(i); + printf(" ISA IRQs: 0x%4.4llx\n", + (unsigned long long) i.pad[0]); +} + +static void dump_pci_link(void) +{ + HVM_SAVE_TYPE(PCI_LINK) l; + READ(l); + printf(" PCI LINK: %u %u %u %u\n", + l.route[0], l.route[1], l.route[2], l.route[3]); +} + +static void dump_pit(void) +{ + int i; + HVM_SAVE_TYPE(PIT) p; + READ(p); + printf(" PIT: speaker %s\n", p.speaker_data_on ? "on" : "off"); + if (debug & 0x02) { + for ( i = 0 ; i < 2 ; i++ ) + { + printf(" ch %1i: count %#x, latched_count %#x, count_latched %u\n", + i, p.channels[i].count, p.channels[i].latched_count, + p.channels[i].count_latched); + printf(" status %#x, status_latched %#x\n", + p.channels[i].status, p.channels[i].status_latched); + printf(" rd_state %#x, wr_state %#x, wr_latch %#x, rw_mode %#x\n", + p.channels[i].read_state, p.channels[i].write_state, + p.channels[i].write_latch, p.channels[i].rw_mode); + printf(" mode %#x, bcd %#x, gate %#x\n", + p.channels[i].mode, p.channels[i].bcd, p.channels[i].gate); + } + } +} + +static void dump_rtc(void) +{ + HVM_SAVE_TYPE(RTC) r; + READ(r); + printf(" RTC: regs 0x%2.2x 0x%2.2x 0x%2.2x 0x%2.2x 0x%2.2x 0x%2.2x 0x%2.2x 0x%2.2x\n", + r.cmos_data[0], r.cmos_data[1], r.cmos_data[2], r.cmos_data[3], + r.cmos_data[4], r.cmos_data[5], r.cmos_data[6], r.cmos_data[7]); + if (debug & 0x02) { + printf(" 0x%2.2x 0x%2.2x 0x%2.2x 0x%2.2x 0x%2.2x 0x%2.2x, index 0x%2.2x\n", + r.cmos_data[8], r.cmos_data[9], r.cmos_data[10], r.cmos_data[11], + r.cmos_data[12], r.cmos_data[13], r.cmos_index); + } + +} + +static void dump_hpet(void) +{ + int i; + HVM_SAVE_TYPE(HPET) h; + READ(h); + printf(" HPET: capability %#llx config %#llx\n", + (unsigned long long) h.capability, + (unsigned long long) h.config); + if (debug & 0x02) { + printf(" isr %#llx counter %#llx\n", + (unsigned long long) h.isr, + (unsigned long long) h.mc64); + for ( i = 0; i < HPET_TIMER_NUM; i++ ) + { + printf(" timer%i config %#llx cmp %#llx\n", i, + (unsigned long long) h.timers[i].config, + (unsigned long long) h.timers[i].cmp); + printf(" timer%i period %#llx fsb %#llx\n", i, + (unsigned long long) h.period[i], + (unsigned long long) h.timers[i].fsb); + } + } +} + +static void dump_pmtimer(void) +{ + HVM_SAVE_TYPE(PMTIMER) p; + READ(p); + printf(" ACPI PM: TMR_VAL 0x%x, PM1a_STS 0x%x, PM1a_EN 0x%x\n", + p.tmr_val, (unsigned) p.pm1a_sts, (unsigned) p.pm1a_en); +} + +static void dump_mtrr(void) +{ + HVM_SAVE_TYPE(MTRR) p; + int i; + READ(p); + printf(" MTRR: PAT 0x%llx, cap 0x%llx, default 0x%llx\n", + (unsigned long long) p.msr_pat_cr, + (unsigned long long) p.msr_mtrr_cap, + (unsigned long long) p.msr_mtrr_def_type); + if (debug & 0x02) { + for ( i = 0 ; i < MTRR_VCNT ; i++ ) + printf(" var %i 0x%16.16llx 0x%16.16llx\n", i, + (unsigned long long) p.msr_mtrr_var[2 * i], + (unsigned long long) p.msr_mtrr_var[2 * i + 1]); + for ( i = 0 ; i < NUM_FIXED_MSR ; i++ ) + printf(" fixed %.2i 0x%16.16llx\n", i, + (unsigned long long) p.msr_mtrr_fixed[i]); + } +} + +static void dump_viridian_domain(void) +{ + HVM_SAVE_TYPE(VIRIDIAN_DOMAIN) p; + READ(p); + printf(" VIRIDIAN_DOMAIN: hypercall gpa 0x%llx, guest_os_id 0x%llx\n", + (unsigned long long) p.hypercall_gpa, + (unsigned long long) p.guest_os_id); +} + +static void dump_viridian_vcpu(void) +{ + HVM_SAVE_TYPE(VIRIDIAN_VCPU) p; + READ(p); + printf(" VIRIDIAN_VCPU: apic_assist 0x%llx\n", + (unsigned long long) p.apic_assist); +} + +static void dump_vmce_vcpu(void) +{ + HVM_SAVE_TYPE(VMCE_VCPU) p; + READ(p); + printf(" VMCE_VCPU: caps %" PRIx64 "\n", p.caps); + if (debug & 0x02) { + printf(" VMCE_VCPU: bank0 mci_ctl2 %" PRIx64 "\n", p.mci_ctl2_bank0); + printf(" VMCE_VCPU: bank1 mci_ctl2 %" PRIx64 "\n", p.mci_ctl2_bank1); + } +} + +static void dump_tsc_adjust(void) +{ + HVM_SAVE_TYPE(TSC_ADJUST) p; + READ(p); + printf(" TSC_ADJUST: tsc_adjust %" PRIx64 "\n", p.tsc_adjust); +} + +static void dump_xsave(int xsave_size) +{ + uint8_t p[xsave_size]; + READ(p); + printf(" CPU XSAVE:\n"); + if (debug & 0x02) { + vms_dump(&p, sizeof(p)); + } +} + +int main(int argc, char **argv) +{ + int entry, domid, ret; + xc_interface *xch; + uint8_t *tbuf = NULL; + int retval = 0; + + struct hvm_save_descriptor desc; + + if ( argc < 2 || argc > 3 || !argv[1] || (domid = atoi(argv[1])) < 0 ) + { + fprintf(stderr, "usage: %s <domid>\n", argv[0]); + exit(1); + } + if (argc == 3) + debug = atoi(argv[2]); + + xch = xc_interface_open(0,0,0); + if ( !xch ) + { + fprintf(stderr, "Error: can''t open libxc handle\n"); + exit(1); + } + + ret = xc_domain_pause(xch, domid); + if (ret < 0) { + perror("xc_domain_pause"); + exit(-1); + } + + /* + * Calling with zero buffer length should return the buffer length + * required. + */ + len = xc_domain_hvm_getcontext(xch, domid, 0, 0); + if ( len == (uint32_t) -1 ) + { + fprintf(stderr, "Error: can''t get record length for dom %i\n", domid); + exit(1); + } + buf = malloc(len); + if ( buf == NULL ) + { + fprintf(stderr, "Error: can''t allocate %u bytes\n", len); + exit(1); + } + tbuf = malloc(len); + if ( tbuf == NULL ) + { + fprintf(stderr, "Error: can''t allocate %u bytes\n", len); + exit(1); + } + len = xc_domain_hvm_getcontext(xch, domid, buf, len); + if ( len == (uint32_t) -1 ) + { + fprintf(stderr, "Error: can''t get HVM record for dom %i\n", domid); + exit(1); + } + off = 0; + + /* Say hello */ + printf("Check HVM save record vs partial for domain %i\n", domid); + if (debug & 0x20) { +#ifndef DOPIC + printf("PIC: Cannot get just 2nd instance and can change -- time based.\n"); +#endif +#ifndef NOTCLEAN + printf("HPET: Changes -- time based.\n"); + printf("PMTIMER: Changes -- time based.\n"); +#endif +#if defined(FULLCLEAN) || !defined(NOTCLEAN) + printf("IOAPIC: Can change -- time based.\n"); + printf("LAPIC_REGS: Can change -- time based.\n"); + printf("PCI_IRQ: Can change -- time based.\n"); + printf("ISA_IRQ: Can change -- time based.\n"); +#endif + } + + entry = 0; + do { + READ(desc); + if (debug & 0x01) + printf("Entry %i: type %u instance %u, length %u\n", + entry, (unsigned) desc.typecode, + (unsigned) desc.instance, (unsigned) desc.length); + switch (desc.typecode) + { + case HVM_SAVE_CODE(HEADER): /* Not supported by xc_domain_hvm_getcontext_partial */ + case HVM_SAVE_CODE(END): /* Not supported by xc_domain_hvm_getcontext_partial */ +#ifndef DOPIC + case HVM_SAVE_CODE(PIC): /* Cannot get just 2nd instance and can change -- time based */ +#endif +#ifndef NOTCLEAN + case HVM_SAVE_CODE(HPET): /* Changes -- time based */ + case HVM_SAVE_CODE(PMTIMER): /* Changes -- time based */ +#endif +#if defined(FULLCLEAN) || !defined(NOTCLEAN) + case HVM_SAVE_CODE(IOAPIC): /* Can change -- time based */ + case HVM_SAVE_CODE(LAPIC_REGS): /* Can change -- time based */ + case HVM_SAVE_CODE(PCI_IRQ): /* Can change -- time based */ + case HVM_SAVE_CODE(ISA_IRQ): /* Can change -- time based */ +#endif + off += (desc.length); + break; + default: + if (xc_domain_hvm_getcontext_partial( + xch, domid, desc.typecode, desc.instance, + tbuf, desc.length) != 0) { + fprintf(stderr, "Error: xc_domain_hvm_getcontext_partial: entry %i: type %u instance %u, length %u: ", + entry, (unsigned) desc.typecode, + (unsigned) desc.instance, (unsigned) desc.length); + perror(""); + retval = 42; + memset(tbuf, 0xee, desc.length); + } + ret = desc.length; +#ifndef NOTCLEAN + if (desc.typecode == HVM_SAVE_CODE(CPU)) + ret -= 16; /* Skip TSC; it changes while paused... */ +#endif + if (memcmp(tbuf, buf + off, ret) != 0) { + printf("Error: entry %i: type %u instance %u, length %u mismatch!\n", + entry, (unsigned) desc.typecode, + (unsigned) desc.instance, (unsigned) desc.length); + retval = 1; + if (debug & 0x04) { + int i; + uint8_t *gbuf = buf + off; + + for (i = 0; i < ret; i++) { + if (gbuf[i] != tbuf[i]) + printf("[%03x] good=%02x bad=%02x\n", + i, gbuf[i], tbuf[i]); + } + } + if (debug & 0x10) { + printf("good:\n"); + vms_dump(buf + off, desc.length); + printf("bad:\n"); + vms_dump(tbuf, desc.length); + } + switch (desc.typecode) + { + case HVM_SAVE_CODE(HEADER): dump_header(); break; + case HVM_SAVE_CODE(CPU): dump_cpu(); break; + case HVM_SAVE_CODE(PIC): dump_pic(); break; + case HVM_SAVE_CODE(IOAPIC): dump_ioapic(); break; + case HVM_SAVE_CODE(LAPIC): dump_lapic(); break; + case HVM_SAVE_CODE(LAPIC_REGS): dump_lapic_regs(); break; + case HVM_SAVE_CODE(PCI_IRQ): dump_pci_irq(); break; + case HVM_SAVE_CODE(ISA_IRQ): dump_isa_irq(); break; + case HVM_SAVE_CODE(PCI_LINK): dump_pci_link(); break; + case HVM_SAVE_CODE(PIT): dump_pit(); break; + case HVM_SAVE_CODE(RTC): dump_rtc(); break; + case HVM_SAVE_CODE(HPET): dump_hpet(); break; + case HVM_SAVE_CODE(PMTIMER): dump_pmtimer(); break; + case HVM_SAVE_CODE(MTRR): dump_mtrr(); break; + case HVM_SAVE_CODE(VIRIDIAN_DOMAIN): dump_viridian_domain(); break; + case HVM_SAVE_CODE(VIRIDIAN_VCPU): dump_viridian_vcpu(); break; + case HVM_SAVE_CODE(VMCE_VCPU): dump_vmce_vcpu(); break; + case HVM_SAVE_CODE(TSC_ADJUST): dump_tsc_adjust(); break; + case CPU_XSAVE_CODE: dump_xsave(desc.length); break; + case HVM_SAVE_CODE(END): break; + default: + printf(" ** Don''t understand type %u: skipping\n", + (unsigned) desc.typecode); + off += (desc.length); + } + } else { + off += (desc.length); + } + break; + } + entry++; + } while ( desc.typecode != HVM_SAVE_CODE(END) && off < len ); + + ret = xc_domain_unpause(xch, domid); + if (ret < 0) { + perror("xc_domain_unpause"); + exit(1); + } + + return retval; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ -- 1.8.4
From: Don Slutz <dslutz@verizon.com> This is a quick and dirty linux kernel module to offline some cpus as far as Xen knows. This can be used to re-create the bug in hvm_save_one about returning the wrong data. Signed-off-by: Don Slutz <dslutz@verizon.com> --- tools/tests/offline_module/.gitignore | 8 ++++ tools/tests/offline_module/Makefile | 10 ++++ tools/tests/offline_module/README | 37 +++++++++++++++ tools/tests/offline_module/offline.c | 89 +++++++++++++++++++++++++++++++++++ 4 files changed, 144 insertions(+) create mode 100644 tools/tests/offline_module/.gitignore create mode 100644 tools/tests/offline_module/Makefile create mode 100644 tools/tests/offline_module/README create mode 100644 tools/tests/offline_module/offline.c diff --git a/tools/tests/offline_module/.gitignore b/tools/tests/offline_module/.gitignore new file mode 100644 index 0000000..0a0b67b --- /dev/null +++ b/tools/tests/offline_module/.gitignore @@ -0,0 +1,8 @@ +.offline.ko.cmd +.offline.mod.o.cmd +.offline.o.cmd +.tmp_versions +Module.symvers +modules.order +offline.ko +offline.mod.c diff --git a/tools/tests/offline_module/Makefile b/tools/tests/offline_module/Makefile new file mode 100644 index 0000000..8fd246f --- /dev/null +++ b/tools/tests/offline_module/Makefile @@ -0,0 +1,10 @@ +obj-m := offline.o + +KDIR := /lib/modules/$(shell uname -r)/build +PWD := $(shell pwd) + +all: + make -C $(KDIR) SUBDIRS=$(PWD) modules + +clean: + make -C $(KDIR) SUBDIRS=$(PWD) clean diff --git a/tools/tests/offline_module/README b/tools/tests/offline_module/README new file mode 100644 index 0000000..9bc0e01 --- /dev/null +++ b/tools/tests/offline_module/README @@ -0,0 +1,37 @@ +This is a quick and dirty linux kernel module to offline some cpus +as far as Xen knows. + +This can be used to re-create the bug in hvm_save_one about +returning the wrong data. + +You only need the 2 files: Makefile and offline.c alone in some +directory on DomU. "make" should build the kernel module. + +to leave just cpu 3 (id #2) of 8 (id #0-#7) running: + +taskset 0x4 insmod offline.ko who=0xff + + +Note: Not sure when, but older kernels (2.6.18 for example) have: + +/* + * smp_call_function - run a function on all other CPUs. + * @func: The function to run. This must be fast and non-blocking. + * @info: An arbitrary pointer to pass to the function. + * @nonatomic: currently unused. + * @wait: If true, wait (atomically) until function has completed on other + * CPUs. + * Returns 0 on success, else a negative status code. Does not return until + * remote CPUs are nearly ready to execute func or are or have executed. + + +Instead of: + +/** + * smp_call_function(): Run a function on all other CPUs. + * @func: The function to run. This must be fast and non-blocking. + * @info: An arbitrary pointer to pass to the function. + * @wait: If true, wait (atomically) until function has completed + * on other CPUs. + * + * Returns 0. diff --git a/tools/tests/offline_module/offline.c b/tools/tests/offline_module/offline.c new file mode 100644 index 0000000..c538848 --- /dev/null +++ b/tools/tests/offline_module/offline.c @@ -0,0 +1,89 @@ +/* offline + * + * Copyright (C) 2013 by Cloud Switch, Inc. + * Copyright (C) 2013 by Terremark. + * Copyright (C) 2013 by Verizon. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program. If not, see <http://www.gnu.org/licenses/>. + * + */ + +#include <linux/module.h> /* needed by all modules */ +#include <linux/init.h> /* needed for macros */ +#include <linux/kernel.h> /* needed for printk */ +#include <linux/dmi.h> /* needed for dmi_name_in_vendors */ + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Don Slutz"); +MODULE_DESCRIPTION("offline some cpus for xen testing"); + +static int debug = 0; +module_param(debug, int, 0); +MODULE_PARM_DESC(debug, "Debug level (0=none,1=all)"); + +static int who = 0; +module_param(who, int, 0); +MODULE_PARM_DESC(who, "Bit mask of cpus to off line. Max is 32 bits"); + +static int wait = 0; +module_param(wait, int, 0); +MODULE_PARM_DESC(wait, "Wait for offlined cpus"); + +static void offline_self(void *v) +{ + int cpu = smp_processor_id(); + + if (cpu < (sizeof(wait) * 8)) + if (who & (1 << cpu)) + __asm__ ("cli\n\thlt"); +} + +int init_module(void) +{ + + char *version = "$Revision: 1.0 $"; + int verLen = strlen(version); + int cpu = smp_processor_id(); + + if (debug) + printk(KERN_WARNING "offline%.*s who=0x%x wait=%d cpu=%d\n", + verLen - 12, version + 10, who, wait, cpu); + + if (cpu < (sizeof(wait) * 8)) + who &= ~(1 << cpu); +#ifdef OLD_KERNEL + smp_call_function(offline_self, NULL, 0, wait); +#else + smp_call_function(offline_self, NULL, wait); +#endif + if (debug) + printk(KERN_WARNING "offline done who=0x%x\n", who); + + return 0; +} + +void cleanup_module(void) +{ + if (debug) + printk(KERN_WARNING "offline removed\n"); +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * indent-tabs-mode: nil + * End: + */ -- 1.8.4
From: Don Slutz <dslutz@verizon.com> It is possible that hvm_sr_handlers[typecode].save does not use all the provided room. In that case, using: instance * hvm_sr_handlers[typecode].size does not select the correct instance. Add code to search for the correct instance. Signed-off-by: Don Slutz <dslutz@verizon.com> --- xen/common/hvm/save.c | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/xen/common/hvm/save.c b/xen/common/hvm/save.c index de76ada..ff6e910 100644 --- a/xen/common/hvm/save.c +++ b/xen/common/hvm/save.c @@ -112,13 +112,27 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance, d->domain_id, typecode); rv = -EFAULT; } - else if ( copy_to_guest(handle, - ctxt.data - + (instance * hvm_sr_handlers[typecode].size) - + sizeof (struct hvm_save_descriptor), - hvm_sr_handlers[typecode].size - - sizeof (struct hvm_save_descriptor)) ) - rv = -EFAULT; + else + { + uint32_t off; + + rv = -EBADSLT; + for (off = 0; off < ctxt.cur; off += hvm_sr_handlers[typecode].size) { + struct hvm_save_descriptor *desc + = (struct hvm_save_descriptor *)&ctxt.data[off]; + if (instance == desc->instance) { + rv = 0; + if ( copy_to_guest(handle, + ctxt.data + + off + + sizeof (struct hvm_save_descriptor), + hvm_sr_handlers[typecode].size + - sizeof (struct hvm_save_descriptor)) ) + rv = -EFAULT; + break; + } + } + } xfree(ctxt.data); return rv; -- 1.8.4
Don Slutz
2013-Dec-12 00:56 UTC
[BUGFIX][PATCH 4/4] hvm_save_one: allow the 2nd instance to be fetched for PIC.
From: Don Slutz <dslutz@verizon.com> In this case hvm_sr_handlers[typecode].size is 2x the struct size. So use the saved size (length) to look at each possible instance. Signed-off-by: Don Slutz <dslutz@verizon.com> --- xen/common/hvm/save.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/xen/common/hvm/save.c b/xen/common/hvm/save.c index ff6e910..f4138b4 100644 --- a/xen/common/hvm/save.c +++ b/xen/common/hvm/save.c @@ -98,9 +98,6 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance, else sz = hvm_sr_handlers[typecode].size; - if ( (instance + 1) * hvm_sr_handlers[typecode].size > sz ) - return -EINVAL; - ctxt.size = sz; ctxt.data = xmalloc_bytes(sz); if ( !ctxt.data ) @@ -114,20 +111,20 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance, } else { - uint32_t off; + uint32_t off, add; rv = -EBADSLT; - for (off = 0; off < ctxt.cur; off += hvm_sr_handlers[typecode].size) { + for (off = 0; off < ctxt.cur; off += add + sizeof (struct hvm_save_descriptor)) { struct hvm_save_descriptor *desc = (struct hvm_save_descriptor *)&ctxt.data[off]; + add = desc->length; if (instance == desc->instance) { rv = 0; if ( copy_to_guest(handle, ctxt.data + off + sizeof (struct hvm_save_descriptor), - hvm_sr_handlers[typecode].size - - sizeof (struct hvm_save_descriptor)) ) + add) ) rv = -EFAULT; break; } -- 1.8.4
On Wed, 2013-12-11 at 19:56 -0500, Don Slutz wrote:> From: Don Slutz <dslutz@verizon.com> > > This is a quick and dirty linux kernel module to offline some cpus > as far as Xen knows. > > This can be used to re-create the bug in hvm_save_one about > returning the wrong data.Can this not be done via /sys/devices/system/cpu/cpu1/online etc? Documentation/cpu-hotplug.txt seems to suggest it can. Ian.
On 12/12/13 10:01, Ian Campbell wrote:> On Wed, 2013-12-11 at 19:56 -0500, Don Slutz wrote: >> From: Don Slutz <dslutz@verizon.com> >> >> This is a quick and dirty linux kernel module to offline some cpus >> as far as Xen knows. >> >> This can be used to re-create the bug in hvm_save_one about >> returning the wrong data. > > Can this not be done via /sys/devices/system/cpu/cpu1/online etc? > Documentation/cpu-hotplug.txt seems to suggest it can.Yes it can. echo 0 > /sys/devices/system/cpu/cpuN/online And onlined again with 1. David
On 12/12/13 06:09, David Vrabel wrote:> On 12/12/13 10:01, Ian Campbell wrote: >> >On Wed, 2013-12-11 at 19:56 -0500, Don Slutz wrote: >>> >>From: Don Slutz<dslutz@verizon.com> >>> >> >>> >>This is a quick and dirty linux kernel module to offline some cpus >>> >>as far as Xen knows. >>> >> >>> >>This can be used to re-create the bug in hvm_save_one about >>> >>returning the wrong data. >> > >> >Can this not be done via /sys/devices/system/cpu/cpu1/online etc? >> >Documentation/cpu-hotplug.txt seems to suggest it can. > Yes it can. > > echo 0 > /sys/devices/system/cpu/cpuN/online > > And onlined again with 1. > > DavidI had forgotten about that when I was testing. At least on 2.6.18-128.el5, cpu 0 does not work: [root@P-8-0 ~]# echo 0 > /sys/devices/system/cpu/cpu0/online -bash: /sys/devices/system/cpu/cpu0/online: Permission denied [root@P-8-0 ~]# id uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel) context=root:system_r:unconfined_t:SystemLow-SystemHigh [root@P-8-0 ~]# echo 0 > /sys/devices/system/cpu/cpu1/online [root@P-8-0 ~]# echo 0 > /sys/devices/system/cpu/cpu3/online [root@P-8-0 ~]# echo 0 > /sys/devices/system/cpu/cpu5/online [root@P-8-0 ~]# echo 0 > /sys/devices/system/cpu/cpu7/online And making cpu 0 offline was something I wanted to test. -Don Slutz _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 12/12/13 09:24, Don Slutz wrote:> On 12/12/13 06:09, David Vrabel wrote: >> On 12/12/13 10:01, Ian Campbell wrote: >>> >On Wed, 2013-12-11 at 19:56 -0500, Don Slutz wrote: >>>> >>From: Don Slutz<dslutz@verizon.com> >>>> >> >>>> >>This is a quick and dirty linux kernel module to offline some cpus >>>> >>as far as Xen knows. >>>> >> >>>> >>This can be used to re-create the bug in hvm_save_one about >>>> >>returning the wrong data.Opps, I should have added "No need to accept or apply this patch." here and in the cover letter. -Don Slutz>>> > >>> >Can this not be done via /sys/devices/system/cpu/cpu1/online etc? >>> >Documentation/cpu-hotplug.txt seems to suggest it can. >> Yes it can. >> >> echo 0 > /sys/devices/system/cpu/cpuN/online >> >> And onlined again with 1. >> >> David > I had forgotten about that when I was testing. At least on 2.6.18-128.el5, cpu 0 does not work: > > [root@P-8-0 ~]# echo 0 > /sys/devices/system/cpu/cpu0/online > -bash: /sys/devices/system/cpu/cpu0/online: Permission denied > [root@P-8-0 ~]# id > uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel) context=root:system_r:unconfined_t:SystemLow-SystemHigh > [root@P-8-0 ~]# echo 0 > /sys/devices/system/cpu/cpu1/online > [root@P-8-0 ~]# echo 0 > /sys/devices/system/cpu/cpu3/online > [root@P-8-0 ~]# echo 0 > /sys/devices/system/cpu/cpu5/online > [root@P-8-0 ~]# echo 0 > /sys/devices/system/cpu/cpu7/online > > And making cpu 0 offline was something I wanted to test. > -Don Slutz_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Dec-13 14:20 UTC
Re: [BUGFIX][PATCH 3/4] hvm_save_one: return correct data.
>>> On 12.12.13 at 01:56, Don Slutz <dslutz@verizon.com> wrote: > From: Don Slutz <dslutz@verizon.com> > > It is possible that hvm_sr_handlers[typecode].save does not use all > the provided room. In that case, using: > > instance * hvm_sr_handlers[typecode].size > > does not select the correct instance. Add code to search for the > correct instance. > > Signed-off-by: Don Slutz <dslutz@verizon.com>Reviewed-by: Jan Beulich <jbeulich@suse.com> But this needs to be cleaned up coding style wise and ...> --- a/xen/common/hvm/save.c > +++ b/xen/common/hvm/save.c > @@ -112,13 +112,27 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance, > d->domain_id, typecode); > rv = -EFAULT; > } > - else if ( copy_to_guest(handle, > - ctxt.data > - + (instance * hvm_sr_handlers[typecode].size) > - + sizeof (struct hvm_save_descriptor), > - hvm_sr_handlers[typecode].size > - - sizeof (struct hvm_save_descriptor)) ) > - rv = -EFAULT; > + else > + { > + uint32_t off; > + > + rv = -EBADSLT; > + for (off = 0; off < ctxt.cur; off += hvm_sr_handlers[typecode].size) { > + struct hvm_save_descriptor *desc > + = (struct hvm_save_descriptor *)&ctxt.data[off];.. this could be const, and the cast could simply be (void *), ...> + if (instance == desc->instance) { > + rv = 0; > + if ( copy_to_guest(handle, > + ctxt.data > + + off... this doesn''t need to be on a separate line, and ...> + + sizeof (struct hvm_save_descriptor), > + hvm_sr_handlers[typecode].size > + - sizeof (struct hvm_save_descriptor)) )... both these sizeof()s would now better be sizeof(*desc). Jan
Jan Beulich
2013-Dec-13 14:38 UTC
Re: [BUGFIX][PATCH 4/4] hvm_save_one: allow the 2nd instance to be fetched for PIC.
>>> On 12.12.13 at 01:56, Don Slutz <dslutz@verizon.com> wrote: > @@ -114,20 +111,20 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance, > } > else > { > - uint32_t off; > + uint32_t off, add;"add" is a pretty odd name for what this is being used for. Why don''t you use desc->length directly?> > rv = -EBADSLT; > - for (off = 0; off < ctxt.cur; off += hvm_sr_handlers[typecode].size) { > + for (off = 0; off < ctxt.cur; off += add + sizeof (struct hvm_save_descriptor)) { > struct hvm_save_descriptor *desc > = (struct hvm_save_descriptor *)&ctxt.data[off]; > + add = desc->length; > if (instance == desc->instance) { > rv = 0; > if ( copy_to_guest(handle, > ctxt.data > + off > + sizeof (struct hvm_save_descriptor), > - hvm_sr_handlers[typecode].size > - - sizeof (struct hvm_save_descriptor)) ) > + add) )Further, the way this was done before means that for multi- instance records all records would get copied out, but the caller would have no way of knowing that (except by implying that behavior, e.g. "known to be the case for PIC"). Which shows another shortcoming of the interface: There''s no buffer size being passed in from the caller, yet we have variable size records. Which means latent data corruption in the caller. Hence I think rather than complicating the logic here, we should change the interface to pass a size in and back out, which will not only avoid corrupting memory, but also allow the guest to recognize multi-instance data being returned. Jan
Don Slutz
2013-Dec-15 00:29 UTC
Re: [BUGFIX][PATCH 3/4] hvm_save_one: return correct data.
On 12/13/13 09:20, Jan Beulich wrote:>>>> On 12.12.13 at 01:56, Don Slutz <dslutz@verizon.com> wrote:>> From: Don Slutz <dslutz@verizon.com> >> >> It is possible that hvm_sr_handlers[typecode].save does not use >> all the provided room. In that case, using: >> >> instance * hvm_sr_handlers[typecode].size >> >> does not select the correct instance. Add code to search for the >> correct instance. >> >> Signed-off-by: Don Slutz <dslutz@verizon.com> > > Reviewed-by: Jan Beulich <jbeulich@suse.com> > > But this needs to be cleaned up coding style wise and ... > >> --- a/xen/common/hvm/save.c +++ b/xen/common/hvm/save.c @@ -112,13 >> +112,27 @@ int hvm_save_one(struct domain *d, uint16_t typecode, >> uint16_t instance, d->domain_id, typecode); rv = -EFAULT; } - >> else if ( copy_to_guest(handle, - >> ctxt.data - + (instance * >> hvm_sr_handlers[typecode].size) - + >> sizeof (struct hvm_save_descriptor), - >> hvm_sr_handlers[typecode].size - - >> sizeof (struct hvm_save_descriptor)) ) - rv = -EFAULT; + >> else + { + uint32_t off; + + rv = -EBADSLT; + >> for (off = 0; off < ctxt.cur; off + >> hvm_sr_handlers[typecode].size) { + struct >> hvm_save_descriptor *desc + = (struct >> hvm_save_descriptor *)&ctxt.data[off]; > > .. this could be const, and the cast could simply be (void *), ... > >> + if (instance == desc->instance) { + rv >> = 0; + if ( copy_to_guest(handle, + >> ctxt.data + + off > > ... this doesn''t need to be on a separate line, and ... > >> + + sizeof (struct >> hvm_save_descriptor), + >> hvm_sr_handlers[typecode].size + >> - sizeof (struct hvm_save_descriptor)) ) > > ... both these sizeof()s would now better be sizeof(*desc). > > Jan > I think I have corrected all coding errors (please check again). And done all requested changes. I did add the reviewed by (not sure if I should since this changes a large part of the patch, but they are all what Jan said). I have unit tested it and it appears to work the same as the previous version (as expected). Here is the new version, also attached. From e0e8f5246ba492b153884cea93bfe753f1b0782e Mon Sep 17 00:00:00 2001 From: Don Slutz <dslutz@verizon.com> Date: Tue, 12 Nov 2013 08:22:53 -0500 Subject: [PATCH v2 3/4] hvm_save_one: return correct data. It is possible that hvm_sr_handlers[typecode].save does not use all the provided room. In that case, using: instance * hvm_sr_handlers[typecode].size does not select the correct instance. Add code to search for the correct instance. Signed-off-by: Don Slutz <dslutz@verizon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> --- xen/common/hvm/save.c | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/xen/common/hvm/save.c b/xen/common/hvm/save.c index de76ada..6aaea6f 100644 --- a/xen/common/hvm/save.c +++ b/xen/common/hvm/save.c @@ -112,13 +112,27 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance, d->domain_id, typecode); rv = -EFAULT; } - else if ( copy_to_guest(handle, - ctxt.data - + (instance * hvm_sr_handlers[typecode].size) - + sizeof (struct hvm_save_descriptor), - hvm_sr_handlers[typecode].size - - sizeof (struct hvm_save_descriptor)) ) - rv = -EFAULT; + else + { + uint32_t off; + + rv = -EBADSLT; + for ( off = 0; off < ctxt.cur; off += hvm_sr_handlers[typecode].size ) + { + const struct hvm_save_descriptor *desc = (void *)&ctxt.data[off]; + + if ( instance == desc->instance ) + { + rv = 0; + if ( copy_to_guest(handle, + ctxt.data + off + sizeof(*desc), + hvm_sr_handlers[typecode].size + - sizeof(*desc)) ) + rv = -EFAULT; + break; + } + } + } xfree(ctxt.data); return rv; -- 1.7.11.7 -Don Slutz _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Don Slutz
2013-Dec-15 01:38 UTC
Re: [BUGFIX][PATCH 4/4] hvm_save_one: allow the 2nd instance to be fetched for PIC.
On 12/13/13 09:38, Jan Beulich wrote:>>>> On 12.12.13 at 01:56, Don Slutz <dslutz@verizon.com> wrote: >> @@ -114,20 +111,20 @@ int hvm_save_one(struct domain *d, uint16_t typecode, uint16_t instance, >> } >> else >> { >> - uint32_t off; >> + uint32_t off, add; > "add" is a pretty odd name for what this is being used for. Why > don''t you use desc->length directly?I picked the name when the 3rd part of the "for" was "off += add". During unit testing that did not work and so did not try and pick a new one. I could have added add2: const uint32_t add2 = sizeof (struct hvm_save_descriptor); And then the last part of the for becomes "off += add + add2". I can not use desc->length because: save.c: In function ''hvm_save_one'': save.c:120:47: error: ''desc'' undeclared (first use in this function) save.c:120:47: note: each undeclared identifier is reported only once for each function it appears in (It is only defined in the body of the for).>> >> rv = -EBADSLT; >> - for (off = 0; off < ctxt.cur; off += hvm_sr_handlers[typecode].size) { >> + for (off = 0; off < ctxt.cur; off += add + sizeof (struct hvm_save_descriptor)) { >> struct hvm_save_descriptor *desc >> = (struct hvm_save_descriptor *)&ctxt.data[off]; >> + add = desc->length; >> if (instance == desc->instance) { >> rv = 0; >> if ( copy_to_guest(handle, >> ctxt.data >> + off >> + sizeof (struct hvm_save_descriptor), >> - hvm_sr_handlers[typecode].size >> - - sizeof (struct hvm_save_descriptor)) ) >> + add) ) > Further, the way this was done before means that for multi- > instance records all records would get copied out, but the > caller would have no way of knowing that (except by implying > that behavior, e.g. "known to be the case for PIC").Not exactly. Using the following adjustment to check-hvmctx (patch #1): cb4d0e75a497f169c8419b30c1954f92bb8e29a8 Mon Sep 17 00:00:00 2001 From: Don Slutz <dslutz@verizon.com> Date: Sat, 14 Dec 2013 19:31:16 -0500 Subject: [PATCH] check-hvmctx: add check for extra data under debug 8 Signed-off-by: Don Slutz <dslutz@verizon.com> --- tools/tests/check-hvmctx/check-hvmctx.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/tools/tests/check-hvmctx/check-hvmctx.c b/tools/tests/check-hvmctx/check-hvmctx.c index 9b5c3aa..668b77a 100644 --- a/tools/tests/check-hvmctx/check-hvmctx.c +++ b/tools/tests/check-hvmctx/check-hvmctx.c @@ -574,7 +574,7 @@ int main(int argc, char **argv) default: if (xc_domain_hvm_getcontext_partial( xch, domid, desc.typecode, desc.instance, - tbuf, desc.length) != 0) { + tbuf, len) != 0) { fprintf(stderr, "Error: xc_domain_hvm_getcontext_partial: entry %i: type %u instance %u, length %u: ", entry, (unsigned) desc.typecode, (unsigned) desc.instance, (unsigned) desc.length); @@ -582,6 +582,23 @@ int main(int argc, char **argv) retval = 42; memset(tbuf, 0xee, desc.length); } + if (debug & 0x08) { + int i; + int header = 1; + + for (i = desc.length; i < len; i++) { + if (tbuf[i]) { + if (header) { + header = 0; + printf("Error: entry %i: type %u instance %u, length %u extra data!\n", + entry, (unsigned) desc.typecode, + (unsigned) desc.instance, (unsigned) desc.length); + } + printf("[%03x] unexpected data=%02x\n", + i, tbuf[i]); + } + } + } ret = desc.length; #ifndef NOTCLEAN if (desc.typecode == HVM_SAVE_CODE(CPU)) -- 1.7.11.7 and before this patch: dcs-xen-51:~/xen/tools/tests/check-hvmctx>make clean;make CHECK_HVMCTX="-DDOPIC" rm -f *.o check-hvmctx *~ .*.d gcc -O1 -fno-omit-frame-pointer -m64 -g -fno-strict-aliasing -std=gnu99 -Wall -Wstrict-prototypes -Wdeclaration-after-statement -Wno-unused-but-set-variable -Wno-unused-local-typedefs -D__XEN_TOOLS__ -MMD -MF .check-hvmctx.o.d -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -fno-optimize-sibling-calls -Werror -I/home/don/xen/tools/tests/check-hvmctx/../../../tools/libxc -I/home/don/xen/tools/tests/check-hvmctx/../../../tools/include -I/home/don/xen/tools/tests/check-hvmctx/../../../tools/include -I/home/don/xen/tools/tests/check-hvmctx/../../../tools/xenstore -I/home/don/xen/tools/tests/check-hvmctx/../../../tools/include -I/home/don/xen/tools/tests/check-hvmctx/../../../tools -DDOPIC -c -o check-hvmctx.o check-hvmctx.c gcc -o check-hvmctx check-hvmctx.o /home/don/xen/tools/tests/check-hvmctx/../../../tools/libxc/libxenctrl.so dcs-xen-51:~/xen/tools/tests/check-hvmctx>sudo ./check-hvmctx 1 8 Check HVM save record vs partial for domain 1 Error: entry 8: type 3 instance 0, length 8 extra data! [008] unexpected data=03 [00a] unexpected data=01 [00c] unexpected data=08 [010] unexpected data=01 [011] unexpected data=ff [013] unexpected data=28 [016] unexpected data=0c Error: xc_domain_hvm_getcontext_partial: entry 9: type 3 instance 1, length 8: Invalid argument Error: entry 9: type 3 instance 1, length 8 mismatch! PIC: IRQ base 0x28, irr 0x1, imr 0xff, isr 0 I see that after the expected length (8), there is a struct hvm_save_descriptor (type 3 instance 1, length 8) followed by another HVM_SAVE_TYPE(PIC).> Which shows another shortcoming of the interface: There''s no > buffer size being passed in from the caller, yet we have variable > size records. Which means latent data corruption in the caller.This is not 100% correct. The libxl code for xc_domain_hvm_getcontext_partial does take a length: /* Get just one element of the HVM guest context. * size must be >= HVM_SAVE_LENGTH(type) */ int xc_domain_hvm_getcontext_partial(xc_interface *xch, uint32_t domid, uint16_t typecode, uint16_t instance, void *ctxt_buf, uint32_t size) { and it gets embeded in DECLARE_HYPERCALL_BOUNCE(ctxt_buf, size, XC_HYPERCALL_BUFFER_BOUNCE_OUT); which is handle. I do not know that much about "XEN_GUEST_HANDLE_64(uint8) handle" and DECLARE_HYPERCALL_BOUNCE, but what my unit testing has shown me is that copy_to_guest(handle, <ptr>, <nr>) does only copy up to the size stored in handle. It looks to zero an unknown amount more (looks to be page sized). So there is more needed here.> Hence I think rather than complicating the logic here, we should > change the interface to pass a size in and back out, which will > not only avoid corrupting memory, but also allow the guest to > recognize multi-instance data being returned.The size is passed in, just not passed out. And the fact that the data is: HVM_SAVE_TYPE(PIC) struct hvm_save_descriptor HVM_SAVE_TYPE(PIC) Is strange to me. One descriptor for two entries. This was the primary reason I went the way I did. I am not saying that the interface should not change; I just do not see that as a bug fix type change. -Don Slutz> Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Andrew Cooper
2013-Dec-15 16:51 UTC
Re: [BUGFIX][PATCH 3/4] hvm_save_one: return correct data.
On 15/12/2013 00:29, Don Slutz wrote:> > I think I have corrected all coding errors (please check again). And > done all requested changes. I did add the reviewed by (not sure if I > should since this changes a large part of the patch, but they are all > what Jan said). > > I have unit tested it and it appears to work the same as the previous > version (as expected). > > Here is the new version, also attached. > > From e0e8f5246ba492b153884cea93bfe753f1b0782e Mon Sep 17 00:00:00 2001 > From: Don Slutz <dslutz@verizon.com> > Date: Tue, 12 Nov 2013 08:22:53 -0500 > Subject: [PATCH v2 3/4] hvm_save_one: return correct data. > > It is possible that hvm_sr_handlers[typecode].save does not use all > the provided room. In that case, using: > > instance * hvm_sr_handlers[typecode].size > > does not select the correct instance. Add code to search for the > correct instance. > > Signed-off-by: Don Slutz <dslutz@verizon.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com>but this fairs no better at selecting the correct subset in the case that less data than hvm_sr_handlers[typecode].size is written by hvm_sr_handlers[typecode].save. It always increments by ''size'' bytes, and will only copy the data back if the bytes under desc->instance happen to match the instance we are looking for. The only solution I can see is that for the per-vcpu records, the save functions get refactored to take an instance ID, and only save their specific instance. I shall have a go at this and see how invasive it is. ~Andrew> --- > xen/common/hvm/save.c | 28 +++++++++++++++++++++------- > 1 file changed, 21 insertions(+), 7 deletions(-) > > diff --git a/xen/common/hvm/save.c b/xen/common/hvm/save.c > index de76ada..6aaea6f 100644 > --- a/xen/common/hvm/save.c > +++ b/xen/common/hvm/save.c > @@ -112,13 +112,27 @@ int hvm_save_one(struct domain *d, uint16_t > typecode, uint16_t instance, > d->domain_id, typecode); > rv = -EFAULT; > } > - else if ( copy_to_guest(handle, > - ctxt.data > - + (instance * > hvm_sr_handlers[typecode].size) > - + sizeof (struct hvm_save_descriptor), > - hvm_sr_handlers[typecode].size > - - sizeof (struct hvm_save_descriptor)) ) > - rv = -EFAULT; > + else > + { > + uint32_t off; > + > + rv = -EBADSLT; > + for ( off = 0; off < ctxt.cur; off +> hvm_sr_handlers[typecode].size ) > + { > + const struct hvm_save_descriptor *desc = (void > *)&ctxt.data[off]; > + > + if ( instance == desc->instance ) > + { > + rv = 0; > + if ( copy_to_guest(handle, > + ctxt.data + off + sizeof(*desc), > + hvm_sr_handlers[typecode].size > + - sizeof(*desc)) ) > + rv = -EFAULT; > + break; > + } > + } > + } > > xfree(ctxt.data); > return rv; > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Don Slutz
2013-Dec-15 17:19 UTC
Re: [BUGFIX][PATCH 3/4] hvm_save_one: return correct data.
On 12/15/13 11:51, Andrew Cooper wrote:> On 15/12/2013 00:29, Don Slutz wrote: >> >> I think I have corrected all coding errors (please check again). And >> done all requested changes. I did add the reviewed by (not sure if I >> should since this changes a large part of the patch, but they are all >> what Jan said). >> >> I have unit tested it and it appears to work the same as the previous >> version (as expected). >> >> Here is the new version, also attached. >> >> From e0e8f5246ba492b153884cea93bfe753f1b0782e Mon Sep 17 00:00:00 2001 >> From: Don Slutz <dslutz@verizon.com> >> Date: Tue, 12 Nov 2013 08:22:53 -0500 >> Subject: [PATCH v2 3/4] hvm_save_one: return correct data. >> >> It is possible that hvm_sr_handlers[typecode].save does not use all >> the provided room. In that case, using: >> >> instance * hvm_sr_handlers[typecode].size >> >> does not select the correct instance. Add code to search for the >> correct instance. >> >> Signed-off-by: Don Slutz <dslutz@verizon.com> >> Reviewed-by: Jan Beulich <jbeulich@suse.com> > > but this fairs no better at selecting the correct subset in the case > that less data than hvm_sr_handlers[typecode].size is written by > hvm_sr_handlers[typecode].save. >True, but the inverse is the case here; .save writes ''n'' ''size'' blocks. Form the loop above: if ( hvm_sr_handlers[typecode].kind == HVMSR_PER_VCPU ) for_each_vcpu(d, v) sz += hvm_sr_handlers[typecode].size; else sz = hvm_sr_handlers[typecode].size; so sz is in multiples of ''size''. Normally sz == ctxt.cur. With some offline vcpus it write fewer ''size'' blocks.> It always increments by ''size'' bytes, and will only copy the data back > if the bytes under desc->instance happen to match the instance we are > looking for. >The only time it does not find one is for an offline vcpu. Try out the unit test code in patch #1 on an unchanged xen. It should not display anything. Then offline a cpu in a domU (echo 0 > /sys/devices/system/cpu/cpu1/online). And with 3 vcpus, it will report an error. -Don Slutz> The only solution I can see is that for the per-vcpu records, the save > functions get refactored to take an instance ID, and only save their > specific instance. > > I shall have a go at this and see how invasive it is. > > ~Andrew > >> --- >> xen/common/hvm/save.c | 28 +++++++++++++++++++++------- >> 1 file changed, 21 insertions(+), 7 deletions(-) >> >> diff --git a/xen/common/hvm/save.c b/xen/common/hvm/save.c >> index de76ada..6aaea6f 100644 >> --- a/xen/common/hvm/save.c >> +++ b/xen/common/hvm/save.c >> @@ -112,13 +112,27 @@ int hvm_save_one(struct domain *d, uint16_t >> typecode, uint16_t instance, >> d->domain_id, typecode); >> rv = -EFAULT; >> } >> - else if ( copy_to_guest(handle, >> - ctxt.data >> - + (instance * >> hvm_sr_handlers[typecode].size) >> - + sizeof (struct hvm_save_descriptor), >> - hvm_sr_handlers[typecode].size >> - - sizeof (struct hvm_save_descriptor)) ) >> - rv = -EFAULT; >> + else >> + { >> + uint32_t off; >> + >> + rv = -EBADSLT; >> + for ( off = 0; off < ctxt.cur; off += >> hvm_sr_handlers[typecode].size ) >> + { >> + const struct hvm_save_descriptor *desc = (void >> *)&ctxt.data[off]; >> + >> + if ( instance == desc->instance ) >> + { >> + rv = 0; >> + if ( copy_to_guest(handle, >> + ctxt.data + off + sizeof(*desc), >> + hvm_sr_handlers[typecode].size >> + - sizeof(*desc)) ) >> + rv = -EFAULT; >> + break; >> + } >> + } >> + } >> >> xfree(ctxt.data); >> return rv; >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Andrew Cooper
2013-Dec-15 17:22 UTC
Re: [BUGFIX][PATCH 3/4] hvm_save_one: return correct data.
On 15/12/2013 17:19, Don Slutz wrote:> On 12/15/13 11:51, Andrew Cooper wrote: >> On 15/12/2013 00:29, Don Slutz wrote: >>> >>> I think I have corrected all coding errors (please check again). And >>> done all requested changes. I did add the reviewed by (not sure if >>> I should since this changes a large part of the patch, but they are >>> all what Jan said). >>> >>> I have unit tested it and it appears to work the same as the >>> previous version (as expected). >>> >>> Here is the new version, also attached. >>> >>> From e0e8f5246ba492b153884cea93bfe753f1b0782e Mon Sep 17 00:00:00 2001 >>> From: Don Slutz <dslutz@verizon.com> >>> Date: Tue, 12 Nov 2013 08:22:53 -0500 >>> Subject: [PATCH v2 3/4] hvm_save_one: return correct data. >>> >>> It is possible that hvm_sr_handlers[typecode].save does not use all >>> the provided room. In that case, using: >>> >>> instance * hvm_sr_handlers[typecode].size >>> >>> does not select the correct instance. Add code to search for the >>> correct instance. >>> >>> Signed-off-by: Don Slutz <dslutz@verizon.com> >>> Reviewed-by: Jan Beulich <jbeulich@suse.com> >> >> but this fairs no better at selecting the correct subset in the case >> that less data than hvm_sr_handlers[typecode].size is written by >> hvm_sr_handlers[typecode].save. >> > True, but the inverse is the case here; .save writes ''n'' ''size'' > blocks. Form the loop above: > > if ( hvm_sr_handlers[typecode].kind == HVMSR_PER_VCPU ) > for_each_vcpu(d, v) > sz += hvm_sr_handlers[typecode].size; > else > sz = hvm_sr_handlers[typecode].size; > > so sz is in multiples of ''size''. Normally sz == ctxt.cur. With some > offline vcpus it write fewer ''size'' blocks. >> It always increments by ''size'' bytes, and will only copy the data >> back if the bytes under desc->instance happen to match the instance >> we are looking for. >> > The only time it does not find one is for an offline vcpu. Try out > the unit test code in patch #1 on an unchanged xen. It should not > display anything. Then offline a cpu in a domU (echo 0 > > /sys/devices/system/cpu/cpu1/online). And with 3 vcpus, it will > report an error. > > -Don SlutzAh - so there are actually two problems. I see now the one you are trying to solve, and would agree that your code does solve it. However, some of the save handlers are themselves variable length, and will write records shorter than hvm_sr_handlers[typecode].size if they can get away with doing so. In this case, the new logic still wont get the correct instance. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Don Slutz
2013-Dec-15 17:42 UTC
Re: [BUGFIX][PATCH 3/4] hvm_save_one: return correct data.
On 12/15/13 12:22, Andrew Cooper wrote:> On 15/12/2013 17:19, Don Slutz wrote: >> On 12/15/13 11:51, Andrew Cooper wrote: >>> On 15/12/2013 00:29, Don Slutz wrote: >>>> >>>> I think I have corrected all coding errors (please check again). >>>> And done all requested changes. I did add the reviewed by (not >>>> sure if I should since this changes a large part of the patch, but >>>> they are all what Jan said). >>>> >>>> I have unit tested it and it appears to work the same as the >>>> previous version (as expected). >>>> >>>> Here is the new version, also attached. >>>> >>>> From e0e8f5246ba492b153884cea93bfe753f1b0782e Mon Sep 17 00:00:00 2001 >>>> From: Don Slutz <dslutz@verizon.com> >>>> Date: Tue, 12 Nov 2013 08:22:53 -0500 >>>> Subject: [PATCH v2 3/4] hvm_save_one: return correct data. >>>> >>>> It is possible that hvm_sr_handlers[typecode].save does not use all >>>> the provided room. In that case, using: >>>> >>>> instance * hvm_sr_handlers[typecode].size >>>> >>>> does not select the correct instance. Add code to search for the >>>> correct instance. >>>> >>>> Signed-off-by: Don Slutz <dslutz@verizon.com> >>>> Reviewed-by: Jan Beulich <jbeulich@suse.com> >>> >>> but this fairs no better at selecting the correct subset in the case >>> that less data than hvm_sr_handlers[typecode].size is written by >>> hvm_sr_handlers[typecode].save. >>> >> True, but the inverse is the case here; .save writes ''n'' ''size'' >> blocks. Form the loop above: >> >> if ( hvm_sr_handlers[typecode].kind == HVMSR_PER_VCPU ) >> for_each_vcpu(d, v) >> sz += hvm_sr_handlers[typecode].size; >> else >> sz = hvm_sr_handlers[typecode].size; >> >> so sz is in multiples of ''size''. Normally sz == ctxt.cur. With some >> offline vcpus it write fewer ''size'' blocks. >>> It always increments by ''size'' bytes, and will only copy the data >>> back if the bytes under desc->instance happen to match the instance >>> we are looking for. >>> >> The only time it does not find one is for an offline vcpu. Try out >> the unit test code in patch #1 on an unchanged xen. It should not >> display anything. Then offline a cpu in a domU (echo 0 > >> /sys/devices/system/cpu/cpu1/online). And with 3 vcpus, it will >> report an error. >> >> -Don Slutz > > Ah - so there are actually two problems. I see now the one you are > trying to solve, and would agree that your code does solve it. > > However, some of the save handlers are themselves variable length, and > will write records shorter than hvm_sr_handlers[typecode].size if they > can get away with doing so. In this case, the new logic still wont > get the correct instance. >Not sure which one(s) you are referring to. From the full dump: xen-hvmctx 1| grep -i entry Entry 0: type 1 instance 0, length 24 Entry 1: type 2 instance 0, length 1024 Entry 2: type 2 instance 2, length 1024 Entry 3: type 2 instance 3, length 1024 Entry 4: type 2 instance 4, length 1024 Entry 5: type 2 instance 5, length 1024 Entry 6: type 2 instance 6, length 1024 Entry 7: type 2 instance 7, length 1024 Entry 8: type 3 instance 0, length 8 Entry 9: type 3 instance 1, length 8 Entry 10: type 4 instance 0, length 400 Entry 11: type 5 instance 0, length 24 Entry 12: type 5 instance 1, length 24 Entry 13: type 5 instance 2, length 24 Entry 14: type 5 instance 3, length 24 Entry 15: type 5 instance 4, length 24 Entry 16: type 5 instance 5, length 24 Entry 17: type 5 instance 6, length 24 Entry 18: type 5 instance 7, length 24 Entry 19: type 6 instance 0, length 1024 Entry 20: type 6 instance 1, length 1024 Entry 21: type 6 instance 2, length 1024 Entry 22: type 6 instance 3, length 1024 Entry 23: type 6 instance 4, length 1024 Entry 24: type 6 instance 5, length 1024 Entry 25: type 6 instance 6, length 1024 Entry 26: type 6 instance 7, length 1024 Entry 27: type 7 instance 0, length 16 Entry 28: type 8 instance 0, length 8 Entry 29: type 9 instance 0, length 8 Entry 30: type 10 instance 0, length 56 Entry 31: type 11 instance 0, length 16 Entry 32: type 12 instance 0, length 1048 Entry 33: type 13 instance 0, length 8 Entry 34: type 14 instance 0, length 240 Entry 35: type 14 instance 1, length 240 Entry 36: type 14 instance 2, length 240 Entry 37: type 14 instance 3, length 240 Entry 38: type 14 instance 4, length 240 Entry 39: type 14 instance 5, length 240 Entry 40: type 14 instance 6, length 240 Entry 41: type 14 instance 7, length 240 Entry 42: type 16 instance 0, length 856 Entry 43: type 16 instance 1, length 856 Entry 44: type 16 instance 2, length 856 Entry 45: type 16 instance 3, length 856 Entry 46: type 16 instance 4, length 856 Entry 47: type 16 instance 5, length 856 Entry 48: type 16 instance 6, length 856 Entry 49: type 16 instance 7, length 856 Entry 50: type 18 instance 0, length 24 Entry 51: type 18 instance 1, length 24 Entry 52: type 18 instance 2, length 24 Entry 53: type 18 instance 3, length 24 Entry 54: type 18 instance 4, length 24 Entry 55: type 18 instance 5, length 24 Entry 56: type 18 instance 6, length 24 Entry 57: type 18 instance 7, length 24 Entry 58: type 19 instance 0, length 8 Entry 59: type 19 instance 1, length 8 Entry 60: type 19 instance 2, length 8 Entry 61: type 19 instance 3, length 8 Entry 62: type 19 instance 4, length 8 Entry 63: type 19 instance 5, length 8 Entry 64: type 19 instance 6, length 8 Entry 65: type 19 instance 7, length 8 Entry 66: type 0 instance 0, length 0 All typecode''s appear to save the same amount per instance. Most use hvm_save_entry: ... _hvm_write_entry((_h), (_src), HVM_SAVE_LENGTH(_x)); \ and /* Syntactic sugar around that function: specify the max number of * saves, and this calculates the size of buffer needed */ #define HVM_REGISTER_SAVE_RESTORE(_x, _save, _load, _num, _k) \ static int __init __hvm_register_##_x##_save_and_restore(void) \ { \ hvm_register_savevm(HVM_SAVE_CODE(_x), \ #_x, \ &_save, \ &_load, \ (_num) * (HVM_SAVE_LENGTH(_x) \ + sizeof (struct hvm_save_descriptor)), \ _k); \ return 0; \ } \ I do not find any that call on _hvm_write_entry directly. The only special one I found: CPU_XSAVE_CODE Still "writes" a full sized entry: if ( _hvm_init_entry(h, CPU_XSAVE_CODE, v->vcpu_id, HVM_CPU_XSAVE_SIZE) ) return 1; ctxt = (struct hvm_hw_cpu_xsave *)&h->data[h->cur]; h->cur += HVM_CPU_XSAVE_SIZE; memset(ctxt, 0, HVM_CPU_XSAVE_SIZE); It then modifies the zeros conditionaly. if ( v->fpu_initialised ) memcpy(&ctxt->save_area, v->arch.xsave_area, xsave_cntxt_size); #define HVM_CPU_XSAVE_SIZE (3 * sizeof(uint64_t) + xsave_cntxt_size) is part of this. /* We need variable length data chunk for xsave area, hence customized * declaration other than HVM_REGISTER_SAVE_RESTORE. */ static int __init __hvm_register_CPU_XSAVE_save_and_restore(void) { hvm_register_savevm(CPU_XSAVE_CODE, "CPU_XSAVE", hvm_save_cpu_xsave_states, hvm_load_cpu_xsave_states, HVM_CPU_XSAVE_SIZE + sizeof (struct hvm_save_descriptor), HVMSR_PER_VCPU); return 0; } __initcall(__hvm_register_CPU_XSAVE_save_and_restore); is the final part of this one. So I do not find any code that does what you are wondering about. -Don> ~Andrew_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Andrew Cooper
2013-Dec-15 18:11 UTC
Re: [BUGFIX][PATCH 3/4] hvm_save_one: return correct data.
On 15/12/2013 17:42, Don Slutz wrote:> On 12/15/13 12:22, Andrew Cooper wrote: >> On 15/12/2013 17:19, Don Slutz wrote: >>> On 12/15/13 11:51, Andrew Cooper wrote: >>>> On 15/12/2013 00:29, Don Slutz wrote: >>>>> >>>>> I think I have corrected all coding errors (please check again). >>>>> And done all requested changes. I did add the reviewed by (not >>>>> sure if I should since this changes a large part of the patch, but >>>>> they are all what Jan said). >>>>> >>>>> I have unit tested it and it appears to work the same as the >>>>> previous version (as expected). >>>>> >>>>> Here is the new version, also attached. >>>>> >>>>> From e0e8f5246ba492b153884cea93bfe753f1b0782e Mon Sep 17 00:00:00 >>>>> 2001 >>>>> From: Don Slutz <dslutz@verizon.com> >>>>> Date: Tue, 12 Nov 2013 08:22:53 -0500 >>>>> Subject: [PATCH v2 3/4] hvm_save_one: return correct data. >>>>> >>>>> It is possible that hvm_sr_handlers[typecode].save does not use all >>>>> the provided room. In that case, using: >>>>> >>>>> instance * hvm_sr_handlers[typecode].size >>>>> >>>>> does not select the correct instance. Add code to search for the >>>>> correct instance. >>>>> >>>>> Signed-off-by: Don Slutz <dslutz@verizon.com> >>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com> >>>> >>>> but this fairs no better at selecting the correct subset in the >>>> case that less data than hvm_sr_handlers[typecode].size is written >>>> by hvm_sr_handlers[typecode].save. >>>> >>> True, but the inverse is the case here; .save writes ''n'' ''size'' >>> blocks. Form the loop above: >>> >>> if ( hvm_sr_handlers[typecode].kind == HVMSR_PER_VCPU ) >>> for_each_vcpu(d, v) >>> sz += hvm_sr_handlers[typecode].size; >>> else >>> sz = hvm_sr_handlers[typecode].size; >>> >>> so sz is in multiples of ''size''. Normally sz == ctxt.cur. With >>> some offline vcpus it write fewer ''size'' blocks. >>>> It always increments by ''size'' bytes, and will only copy the data >>>> back if the bytes under desc->instance happen to match the instance >>>> we are looking for. >>>> >>> The only time it does not find one is for an offline vcpu. Try out >>> the unit test code in patch #1 on an unchanged xen. It should not >>> display anything. Then offline a cpu in a domU (echo 0 > >>> /sys/devices/system/cpu/cpu1/online). And with 3 vcpus, it will >>> report an error. >>> >>> -Don Slutz >> >> Ah - so there are actually two problems. I see now the one you are >> trying to solve, and would agree that your code does solve it. >> >> However, some of the save handlers are themselves variable length, >> and will write records shorter than hvm_sr_handlers[typecode].size if >> they can get away with doing so. In this case, the new logic still >> wont get the correct instance. >> > Not sure which one(s) you are referring to. > > From the full dump: > > xen-hvmctx 1| grep -i entry > Entry 0: type 1 instance 0, length 24 > Entry 1: type 2 instance 0, length 1024 > Entry 2: type 2 instance 2, length 1024 > Entry 3: type 2 instance 3, length 1024 > Entry 4: type 2 instance 4, length 1024 > Entry 5: type 2 instance 5, length 1024 > Entry 6: type 2 instance 6, length 1024 > Entry 7: type 2 instance 7, length 1024 > Entry 8: type 3 instance 0, length 8 > Entry 9: type 3 instance 1, length 8 > Entry 10: type 4 instance 0, length 400 > Entry 11: type 5 instance 0, length 24 > Entry 12: type 5 instance 1, length 24 > Entry 13: type 5 instance 2, length 24 > Entry 14: type 5 instance 3, length 24 > Entry 15: type 5 instance 4, length 24 > Entry 16: type 5 instance 5, length 24 > Entry 17: type 5 instance 6, length 24 > Entry 18: type 5 instance 7, length 24 > Entry 19: type 6 instance 0, length 1024 > Entry 20: type 6 instance 1, length 1024 > Entry 21: type 6 instance 2, length 1024 > Entry 22: type 6 instance 3, length 1024 > Entry 23: type 6 instance 4, length 1024 > Entry 24: type 6 instance 5, length 1024 > Entry 25: type 6 instance 6, length 1024 > Entry 26: type 6 instance 7, length 1024 > Entry 27: type 7 instance 0, length 16 > Entry 28: type 8 instance 0, length 8 > Entry 29: type 9 instance 0, length 8 > Entry 30: type 10 instance 0, length 56 > Entry 31: type 11 instance 0, length 16 > Entry 32: type 12 instance 0, length 1048 > Entry 33: type 13 instance 0, length 8 > Entry 34: type 14 instance 0, length 240 > Entry 35: type 14 instance 1, length 240 > Entry 36: type 14 instance 2, length 240 > Entry 37: type 14 instance 3, length 240 > Entry 38: type 14 instance 4, length 240 > Entry 39: type 14 instance 5, length 240 > Entry 40: type 14 instance 6, length 240 > Entry 41: type 14 instance 7, length 240 > Entry 42: type 16 instance 0, length 856 > Entry 43: type 16 instance 1, length 856 > Entry 44: type 16 instance 2, length 856 > Entry 45: type 16 instance 3, length 856 > Entry 46: type 16 instance 4, length 856 > Entry 47: type 16 instance 5, length 856 > Entry 48: type 16 instance 6, length 856 > Entry 49: type 16 instance 7, length 856 > Entry 50: type 18 instance 0, length 24 > Entry 51: type 18 instance 1, length 24 > Entry 52: type 18 instance 2, length 24 > Entry 53: type 18 instance 3, length 24 > Entry 54: type 18 instance 4, length 24 > Entry 55: type 18 instance 5, length 24 > Entry 56: type 18 instance 6, length 24 > Entry 57: type 18 instance 7, length 24 > Entry 58: type 19 instance 0, length 8 > Entry 59: type 19 instance 1, length 8 > Entry 60: type 19 instance 2, length 8 > Entry 61: type 19 instance 3, length 8 > Entry 62: type 19 instance 4, length 8 > Entry 63: type 19 instance 5, length 8 > Entry 64: type 19 instance 6, length 8 > Entry 65: type 19 instance 7, length 8 > Entry 66: type 0 instance 0, length 0 > > All typecode''s appear to save the same amount per instance. > > Most use hvm_save_entry: > > ... > _hvm_write_entry((_h), (_src), HVM_SAVE_LENGTH(_x)); \ > > and > > /* Syntactic sugar around that function: specify the max number > of > * saves, and this calculates the size of buffer needed */ > #define HVM_REGISTER_SAVE_RESTORE(_x, _save, _load, _num, > _k) \ > static int __init > __hvm_register_##_x##_save_and_restore(void) \ > { > \ > > hvm_register_savevm(HVM_SAVE_CODE(_x), \ > > #_x, \ > > &_save, \ > > &_load, \ > (_num) * > (HVM_SAVE_LENGTH(_x) \ > + sizeof (struct > hvm_save_descriptor)), \ > > _k); \ > return > 0; \ > } > \ > > I do not find any that call on _hvm_write_entry directly. > > The only special one I found: CPU_XSAVE_CODE > > Still "writes" a full sized entry: > > if ( _hvm_init_entry(h, CPU_XSAVE_CODE, v->vcpu_id, > HVM_CPU_XSAVE_SIZE) ) > return 1; > ctxt = (struct hvm_hw_cpu_xsave *)&h->data[h->cur]; > h->cur += HVM_CPU_XSAVE_SIZE; > memset(ctxt, 0, HVM_CPU_XSAVE_SIZE); > > It then modifies the zeros conditionaly. > > if ( v->fpu_initialised ) > memcpy(&ctxt->save_area, > v->arch.xsave_area, xsave_cntxt_size); > > #define HVM_CPU_XSAVE_SIZE (3 * sizeof(uint64_t) + xsave_cntxt_size) > > is part of this. > > /* We need variable length data chunk for xsave area, hence > customized > * declaration other than > HVM_REGISTER_SAVE_RESTORE. > > */ > static int __init __hvm_register_CPU_XSAVE_save_and_restore(void) > { > hvm_register_savevm(CPU_XSAVE_CODE, > "CPU_XSAVE", > hvm_save_cpu_xsave_states, > hvm_load_cpu_xsave_states, > HVM_CPU_XSAVE_SIZE + sizeof (struct > hvm_save_descriptor), > HVMSR_PER_VCPU); > return 0; > } > __initcall(__hvm_register_CPU_XSAVE_save_and_restore); > > is the final part of this one. So I do not find any code that does > what you are wondering about. > > -Don >HVM_CPU_XSAVE_SIZE() changes depending on which xsave features have ever been enabled by a vcpu (size is proportional to the contents of v->arch.xcr0_accum). It is not guaranteed to be the same for each vcpu in a domain, (although almost certainly will be the same for any recognisable OS) Jan''s new generic MSR save record will also write less than the maximum if it can. ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Don Slutz
2013-Dec-15 18:41 UTC
Re: [BUGFIX][PATCH 3/4] hvm_save_one: return correct data.
On 12/15/13 13:11, Andrew Cooper wrote:> On 15/12/2013 17:42, Don Slutz wrote: >> On 12/15/13 12:22, Andrew Cooper wrote: >>> On 15/12/2013 17:19, Don Slutz wrote: >>>> On 12/15/13 11:51, Andrew Cooper wrote: >>>>> On 15/12/2013 00:29, Don Slutz wrote: >>>>>> >>>>>> I think I have corrected all coding errors (please check again). >>>>>> And done all requested changes. I did add the reviewed by (not >>>>>> sure if I should since this changes a large part of the patch, >>>>>> but they are all what Jan said). >>>>>> >>>>>> I have unit tested it and it appears to work the same as the >>>>>> previous version (as expected). >>>>>> >>>>>> Here is the new version, also attached. >>>>>> >>>>>> From e0e8f5246ba492b153884cea93bfe753f1b0782e Mon Sep 17 00:00:00 >>>>>> 2001 >>>>>> From: Don Slutz <dslutz@verizon.com> >>>>>> Date: Tue, 12 Nov 2013 08:22:53 -0500 >>>>>> Subject: [PATCH v2 3/4] hvm_save_one: return correct data. >>>>>> >>>>>> It is possible that hvm_sr_handlers[typecode].save does not use all >>>>>> the provided room. In that case, using: >>>>>> >>>>>> instance * hvm_sr_handlers[typecode].size >>>>>> >>>>>> does not select the correct instance. Add code to search for the >>>>>> correct instance. >>>>>> >>>>>> Signed-off-by: Don Slutz <dslutz@verizon.com> >>>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com> >>>>> >>>>> but this fairs no better at selecting the correct subset in the >>>>> case that less data than hvm_sr_handlers[typecode].size is written >>>>> by hvm_sr_handlers[typecode].save. >>>>> >>>> True, but the inverse is the case here; .save writes ''n'' ''size'' >>>> blocks. Form the loop above: >>>> >>>> if ( hvm_sr_handlers[typecode].kind == HVMSR_PER_VCPU ) >>>> for_each_vcpu(d, v) >>>> sz += hvm_sr_handlers[typecode].size; >>>> else >>>> sz = hvm_sr_handlers[typecode].size; >>>> >>>> so sz is in multiples of ''size''. Normally sz == ctxt.cur. With >>>> some offline vcpus it write fewer ''size'' blocks. >>>>> It always increments by ''size'' bytes, and will only copy the data >>>>> back if the bytes under desc->instance happen to match the >>>>> instance we are looking for. >>>>> >>>> The only time it does not find one is for an offline vcpu. Try out >>>> the unit test code in patch #1 on an unchanged xen. It should not >>>> display anything. Then offline a cpu in a domU (echo 0 > >>>> /sys/devices/system/cpu/cpu1/online). And with 3 vcpus, it will >>>> report an error. >>>> >>>> -Don Slutz >>> >>> Ah - so there are actually two problems. I see now the one you are >>> trying to solve, and would agree that your code does solve it. >>> >>> However, some of the save handlers are themselves variable length, >>> and will write records shorter than hvm_sr_handlers[typecode].size >>> if they can get away with doing so. In this case, the new logic >>> still wont get the correct instance. >>> >> Not sure which one(s) you are referring to. >> >> From the full dump: >> >> xen-hvmctx 1| grep -i entry >> Entry 0: type 1 instance 0, length 24 >> Entry 1: type 2 instance 0, length 1024 >> Entry 2: type 2 instance 2, length 1024 >> Entry 3: type 2 instance 3, length 1024 >> Entry 4: type 2 instance 4, length 1024 >> Entry 5: type 2 instance 5, length 1024 >> Entry 6: type 2 instance 6, length 1024 >> Entry 7: type 2 instance 7, length 1024 >> Entry 8: type 3 instance 0, length 8 >> Entry 9: type 3 instance 1, length 8 >> Entry 10: type 4 instance 0, length 400 >> Entry 11: type 5 instance 0, length 24 >> Entry 12: type 5 instance 1, length 24 >> Entry 13: type 5 instance 2, length 24 >> Entry 14: type 5 instance 3, length 24 >> Entry 15: type 5 instance 4, length 24 >> Entry 16: type 5 instance 5, length 24 >> Entry 17: type 5 instance 6, length 24 >> Entry 18: type 5 instance 7, length 24 >> Entry 19: type 6 instance 0, length 1024 >> Entry 20: type 6 instance 1, length 1024 >> Entry 21: type 6 instance 2, length 1024 >> Entry 22: type 6 instance 3, length 1024 >> Entry 23: type 6 instance 4, length 1024 >> Entry 24: type 6 instance 5, length 1024 >> Entry 25: type 6 instance 6, length 1024 >> Entry 26: type 6 instance 7, length 1024 >> Entry 27: type 7 instance 0, length 16 >> Entry 28: type 8 instance 0, length 8 >> Entry 29: type 9 instance 0, length 8 >> Entry 30: type 10 instance 0, length 56 >> Entry 31: type 11 instance 0, length 16 >> Entry 32: type 12 instance 0, length 1048 >> Entry 33: type 13 instance 0, length 8 >> Entry 34: type 14 instance 0, length 240 >> Entry 35: type 14 instance 1, length 240 >> Entry 36: type 14 instance 2, length 240 >> Entry 37: type 14 instance 3, length 240 >> Entry 38: type 14 instance 4, length 240 >> Entry 39: type 14 instance 5, length 240 >> Entry 40: type 14 instance 6, length 240 >> Entry 41: type 14 instance 7, length 240 >> Entry 42: type 16 instance 0, length 856 >> Entry 43: type 16 instance 1, length 856 >> Entry 44: type 16 instance 2, length 856 >> Entry 45: type 16 instance 3, length 856 >> Entry 46: type 16 instance 4, length 856 >> Entry 47: type 16 instance 5, length 856 >> Entry 48: type 16 instance 6, length 856 >> Entry 49: type 16 instance 7, length 856 >> Entry 50: type 18 instance 0, length 24 >> Entry 51: type 18 instance 1, length 24 >> Entry 52: type 18 instance 2, length 24 >> Entry 53: type 18 instance 3, length 24 >> Entry 54: type 18 instance 4, length 24 >> Entry 55: type 18 instance 5, length 24 >> Entry 56: type 18 instance 6, length 24 >> Entry 57: type 18 instance 7, length 24 >> Entry 58: type 19 instance 0, length 8 >> Entry 59: type 19 instance 1, length 8 >> Entry 60: type 19 instance 2, length 8 >> Entry 61: type 19 instance 3, length 8 >> Entry 62: type 19 instance 4, length 8 >> Entry 63: type 19 instance 5, length 8 >> Entry 64: type 19 instance 6, length 8 >> Entry 65: type 19 instance 7, length 8 >> Entry 66: type 0 instance 0, length 0 >> >> All typecode''s appear to save the same amount per instance. >> >> Most use hvm_save_entry: >> >> ... >> _hvm_write_entry((_h), (_src), HVM_SAVE_LENGTH(_x)); \ >> >> and >> >> /* Syntactic sugar around that function: specify the max number of >> * saves, and this calculates the size of buffer needed */ >> #define HVM_REGISTER_SAVE_RESTORE(_x, _save, _load, _num, >> _k) \ >> static int __init >> __hvm_register_##_x##_save_and_restore(void) \ >> { \ >> hvm_register_savevm(HVM_SAVE_CODE(_x), \ >> #_x, \ >> &_save, \ >> &_load, \ >> (_num) * >> (HVM_SAVE_LENGTH(_x) \ >> + sizeof (struct >> hvm_save_descriptor)), \ >> _k); \ >> return >> 0; \ >> } \ >> >> I do not find any that call on _hvm_write_entry directly. >> >> The only special one I found: CPU_XSAVE_CODE >> >> Still "writes" a full sized entry: >> >> if ( _hvm_init_entry(h, CPU_XSAVE_CODE, v->vcpu_id, >> HVM_CPU_XSAVE_SIZE) ) >> return 1; >> ctxt = (struct hvm_hw_cpu_xsave *)&h->data[h->cur]; >> h->cur += HVM_CPU_XSAVE_SIZE; >> memset(ctxt, 0, HVM_CPU_XSAVE_SIZE); >> >> It then modifies the zeros conditionaly. >> >> if ( v->fpu_initialised ) >> memcpy(&ctxt->save_area, >> v->arch.xsave_area, xsave_cntxt_size); >> >> #define HVM_CPU_XSAVE_SIZE (3 * sizeof(uint64_t) + xsave_cntxt_size) >> >> is part of this. >> >> /* We need variable length data chunk for xsave area, hence customized >> * declaration other than HVM_REGISTER_SAVE_RESTORE. >> */ >> static int __init __hvm_register_CPU_XSAVE_save_and_restore(void) >> { >> hvm_register_savevm(CPU_XSAVE_CODE, >> "CPU_XSAVE", >> hvm_save_cpu_xsave_states, >> hvm_load_cpu_xsave_states, >> HVM_CPU_XSAVE_SIZE + sizeof (struct >> hvm_save_descriptor), >> HVMSR_PER_VCPU); >> return 0; >> } >> __initcall(__hvm_register_CPU_XSAVE_save_and_restore); >> >> is the final part of this one. So I do not find any code that does >> what you are wondering about. >> >> -Don >> > > HVM_CPU_XSAVE_SIZE() changes depending on which xsave features have > ever been enabled by a vcpu (size is proportional to the contents of > v->arch.xcr0_accum). It is not guaranteed to be the same for each > vcpu in a domain, (although almost certainly will be the same for any > recognisable OS) >Ah, I see. Well, hvm_save_one, hvm_save_size, and hvm_save all expect that hvm_sr_handlers[typecode].size has the max size. I do not see that being true for XSAVE.> Jan''s new generic MSR save record will also write less than the > maximum if it can. >This looks to be Jan''s patch: http://lists.xen.org/archives/html/xen-devel/2013-12/msg02061.html Does look to set hvm_sr_handlers[typecode].size to the max size. And it looks like the code I did in patch #4 would actually fix this issue. Since it now uses the length stored in the save descriptor to find each instance. Jan has some questions about patch #4; so what to do about it is still pending. Clearly I can merge #3 and #4 into 1 patch. -Don Slutz> ~Andrew_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Andrew Cooper
2013-Dec-15 19:06 UTC
Re: [BUGFIX][PATCH 3/4] hvm_save_one: return correct data.
On 15/12/2013 18:41, Don Slutz wrote:> On 12/15/13 13:11, Andrew Cooper wrote: >> On 15/12/2013 17:42, Don Slutz wrote: >>> >>> >>> is the final part of this one. So I do not find any code that does >>> what you are wondering about. >>> >>> -Don >>> >> >> HVM_CPU_XSAVE_SIZE() changes depending on which xsave features have >> ever been enabled by a vcpu (size is proportional to the contents of >> v->arch.xcr0_accum). It is not guaranteed to be the same for each >> vcpu in a domain, (although almost certainly will be the same for any >> recognisable OS) >> > Ah, I see. > > Well, hvm_save_one, hvm_save_size, and hvm_save all expect that > hvm_sr_handlers[typecode].size has the max size. I do not see that > being true for XSAVE.hvm_sr_handlers[typecode].size does need to be the maximum possible size. It does not mean that the maximum amount of data will be written. So long as the load on the far side can read the somewhat-shorter-than-maximum save record, it doesn''t matter (except hvm_save_one). hvm_save_size specifically need to return the maximum size possible, so the toolstack can allocate a big enough buffer. xc_domain_save() does correctly deal with Xen handing back less than the maximum when actually saving the domain.>> Jan''s new generic MSR save record will also write less than the >> maximum if it can. >> > This looks to be Jan''s patch: > > http://lists.xen.org/archives/html/xen-devel/2013-12/msg02061.html > > Does look to set hvm_sr_handlers[typecode].size to the max size. > > And it looks like the code I did in patch #4 would actually fix this > issue. Since it now uses the length stored in the save descriptor to > find each instance. > > Jan has some questions about patch #4; so what to do about it is still > pending. > > Clearly I can merge #3 and #4 into 1 patch. > > -Don Slutz >> ~Andrew > > >As I said, to fix this newest problem I am experimenting with splitting the per-dom and per-vcpu save handlers, and making good progress. It does mean that the fix for #3 would be much much more simple. I shall send out a very RFC series as soon as I can ~Andrew _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Don Slutz
2013-Dec-15 19:23 UTC
Re: [BUGFIX][PATCH 3/4] hvm_save_one: return correct data.
On 12/15/13 14:06, Andrew Cooper wrote:> On 15/12/2013 18:41, Don Slutz wrote: >> On 12/15/13 13:11, Andrew Cooper wrote: >>> On 15/12/2013 17:42, Don Slutz wrote: >>>> >>>> >>>> is the final part of this one. So I do not find any code that does >>>> what you are wondering about. >>>> >>>> -Don >>>> >>> >>> HVM_CPU_XSAVE_SIZE() changes depending on which xsave features have >>> ever been enabled by a vcpu (size is proportional to the contents of >>> v->arch.xcr0_accum). It is not guaranteed to be the same for each >>> vcpu in a domain, (although almost certainly will be the same for >>> any recognisable OS) >>> >> Ah, I see. >> >> Well, hvm_save_one, hvm_save_size, and hvm_save all expect that >> hvm_sr_handlers[typecode].size has the max size. I do not see that >> being true for XSAVE. > > hvm_sr_handlers[typecode].size does need to be the maximum possible > size. It does not mean that the maximum amount of data will be written. > > So long as the load on the far side can read the > somewhat-shorter-than-maximum save record, it doesn''t matter (except > hvm_save_one). hvm_save_size specifically need to return the maximum > size possible, so the toolstack can allocate a big enough buffer. > xc_domain_save() does correctly deal with Xen handing back less than > the maximum when actually saving the domain. > >>> Jan''s new generic MSR save record will also write less than the >>> maximum if it can. >>> >> This looks to be Jan''s patch: >> >> http://lists.xen.org/archives/html/xen-devel/2013-12/msg02061.html >> >> Does look to set hvm_sr_handlers[typecode].size to the max size. >> >> And it looks like the code I did in patch #4 would actually fix this >> issue. Since it now uses the length stored in the save descriptor to >> find each instance. >> >> Jan has some questions about patch #4; so what to do about it is >> still pending. >> >> Clearly I can merge #3 and #4 into 1 patch. >> >> -Don Slutz >>> ~Andrew >> >> >> > > As I said, to fix this newest problem I am experimenting with > splitting the per-dom and per-vcpu save handlers, and making good > progress. It does mean that the fix for #3 would be much much more > simple. > > I shall send out a very RFC series as soon as I can > > ~AndrewGreat, I look forward to seeing them. -Don Slutz _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Dec-16 08:17 UTC
Re: [BUGFIX][PATCH 3/4] hvm_save_one: return correct data.
>>> On 15.12.13 at 17:51, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 15/12/2013 00:29, Don Slutz wrote: >> >> I think I have corrected all coding errors (please check again). And >> done all requested changes. I did add the reviewed by (not sure if I >> should since this changes a large part of the patch, but they are all >> what Jan said). >> >> I have unit tested it and it appears to work the same as the previous >> version (as expected). >> >> Here is the new version, also attached. >> >> From e0e8f5246ba492b153884cea93bfe753f1b0782e Mon Sep 17 00:00:00 2001 >> From: Don Slutz <dslutz@verizon.com> >> Date: Tue, 12 Nov 2013 08:22:53 -0500 >> Subject: [PATCH v2 3/4] hvm_save_one: return correct data. >> >> It is possible that hvm_sr_handlers[typecode].save does not use all >> the provided room. In that case, using: >> >> instance * hvm_sr_handlers[typecode].size >> >> does not select the correct instance. Add code to search for the >> correct instance. >> >> Signed-off-by: Don Slutz <dslutz@verizon.com> >> Reviewed-by: Jan Beulich <jbeulich@suse.com> > > but this fairs no better at selecting the correct subset in the case > that less data than hvm_sr_handlers[typecode].size is written by > hvm_sr_handlers[typecode].save.Oh, yes, indeed.> It always increments by ''size'' bytes, and will only copy the data back > if the bytes under desc->instance happen to match the instance we are > looking for. > > The only solution I can see is that for the per-vcpu records, the save > functions get refactored to take an instance ID, and only save their > specific instance.I don''t see why you shouldn''t be able to look at the descriptor instead - that one does have the correct size, doesn''t it? Jan
Jan Beulich
2013-Dec-16 08:22 UTC
Re: [BUGFIX][PATCH 4/4] hvm_save_one: allow the 2nd instance to be fetched for PIC.
>>> On 15.12.13 at 02:38, Don Slutz <dslutz@verizon.com> wrote: > On 12/13/13 09:38, Jan Beulich wrote: >>>>> On 12.12.13 at 01:56, Don Slutz <dslutz@verizon.com> wrote: >>> @@ -114,20 +111,20 @@ int hvm_save_one(struct domain *d, uint16_t typecode, > uint16_t instance, >>> } >>> else >>> { >>> - uint32_t off; >>> + uint32_t off, add; >> "add" is a pretty odd name for what this is being used for. Why >> don''t you use desc->length directly? > I picked the name when the 3rd part of the "for" was "off += add". > During unit > testing that did not work and so did not try and pick a new one. I > could have added add2: > > const uint32_t add2 = sizeof (struct hvm_save_descriptor); > > And then the last part of the for becomes "off += add + add2". > > I can not use desc->length because: > > save.c: In function ''hvm_save_one'': > save.c:120:47: error: ''desc'' undeclared (first use in this function) > save.c:120:47: note: each undeclared identifier is reported only once > for each function it appears in > > (It is only defined in the body of the for).Right - so simply move it into the next surrounding scope.>> Which shows another shortcoming of the interface: There''s no >> buffer size being passed in from the caller, yet we have variable >> size records. Which means latent data corruption in the caller. > This is not 100% correct. The libxl code for > xc_domain_hvm_getcontext_partial does take a length: > > /* Get just one element of the HVM guest context. > * size must be >= HVM_SAVE_LENGTH(type) */ > int xc_domain_hvm_getcontext_partial(xc_interface *xch, > uint32_t domid, > uint16_t typecode, > uint16_t instance, > void *ctxt_buf, > uint32_t size) > {Which doesn''t mean anything for the hypervisor interface. Yet that''s the one I said has the limitation.> and it gets embeded in > > > DECLARE_HYPERCALL_BOUNCE(ctxt_buf, size, > XC_HYPERCALL_BUFFER_BOUNCE_OUT); > > which is handle. I do not know that much about > "XEN_GUEST_HANDLE_64(uint8) handle" and DECLARE_HYPERCALL_BOUNCE, but > what my unit testing has shown me is that copy_to_guest(handle, <ptr>, > <nr>) does only copy up to the size stored in handle. It looks to zero > an unknown amount more (looks to be page sized). So there is more > needed here.Again, you need to split the way libxc works from the actual hypercall: A handle _never_ tells you to how many items it refers, there''s always a second parameter required to pass the element count (unless known by other means). Jan