Konrad Rzeszutek Wilk
2012-Mar-24 17:41 UTC
[PATCH 0 of 6] Various patches - debug help, xend fixes and S3 resume fixes (v1).
Hello, The majority of these patches are for Xend and for some of the operations it can do. I know that python stack is deprecated but some distros and companies still use - so it makes sense to carry them until those companies (cough cough) convert over to xl. The minority of the patches are: - Fix to PAT when resuming from S3 - this is a fix from VirtualComputer and the authorship should be theirs. - A VGA debug facility when you don''t have any other mechanism to debug - And a fix to the xencommons to load the Xen ACPI processor which has been accepted upstream. Please pull or advice on whether I should split these patches up, etc. tools/python/xen/lowlevel/xc/xc.c | 247 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ tools/python/xen/util/pci.py | 7 ++++- tools/python/xen/xend/XendConfig.py | 13 ++++++++ tools/python/xen/xend/image.py | 10 ++++++- tools/python/xen/xm/create.py | 8 ++++- tools/python/xen/xm/xenapi_create.py | 7 ++++- xen/arch/x86/acpi/power.c | 18 ++++++++++++ xen/drivers/video/vga.c | 7 ++++- 8 files changed, 312 insertions(+), 5 deletions(-)
Konrad Rzeszutek Wilk
2012-Mar-24 17:41 UTC
[PATCH 1 of 6] linux-xencommons: Load xen-acpi-processor
# HG changeset patch # User Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> # Date 1332608051 14400 # Node ID 708bf485e5d4baaf1c6272bd4fb1895a3c355d3e # Parent b7a1794aed59fd2b0816b3bbcf97690c0241814f linux-xencommons: Load xen-acpi-processor Upstream the "xen/acpi-processor: C and P-state driver that uploads said data to hypervisor." takes care of uploading power information information that normally a cpu frequency scaling driver would using in the initial domain. We want the hypervisor to take that data and make good usage of it. Fortunatly for us we do not have to worry about the native cpu frequency scaling drivers being loaded first, as the upstream commit: "xen/cpufreq: Disable the cpu frequency scaling drivers from loading." takes care of that. Meaning we can load the xen-acpi-processor at any time. By default that driver is built as a module - and since we are the only user of it - we should load it. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> diff -r b7a1794aed59 -r 708bf485e5d4 tools/hotplug/Linux/init.d/xencommons --- a/tools/hotplug/Linux/init.d/xencommons Fri Mar 23 13:45:28 2012 +0000 +++ b/tools/hotplug/Linux/init.d/xencommons Sat Mar 24 12:54:11 2012 -0400 @@ -58,6 +58,7 @@ do_start () { modprobe xen-gntdev 2>/dev/null modprobe evtchn 2>/dev/null modprobe gntdev 2>/dev/null + modprobe xen-acpi-processor 2>/dev/null mkdir -p /var/run/xen if ! `xenstore-read -s / >/dev/null 2>&1`
Konrad Rzeszutek Wilk
2012-Mar-24 17:41 UTC
[PATCH 2 of 6] xen/vga: Add ''vga_delay'' parameter to delay screen output by 2 second per screen output
# HG changeset patch # User Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> # Date 1332608052 14400 # Node ID d097c3ba42f601af65b53a0c84973855aab64aa9 # Parent 708bf485e5d4baaf1c6272bd4fb1895a3c355d3e xen/vga: Add ''vga_delay'' parameter to delay screen output by 2 second per screen output. This is useful if you find yourself on machine that has no serial console, nor any PCI, PCIe to put in a serial card. Nothing really fancy except it allows to capture the screenshot of the screen using a camera. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> diff -r 708bf485e5d4 -r d097c3ba42f6 xen/drivers/video/vga.c --- a/xen/drivers/video/vga.c Sat Mar 24 12:54:11 2012 -0400 +++ b/xen/drivers/video/vga.c Sat Mar 24 12:54:12 2012 -0400 @@ -10,7 +10,7 @@ #include <xen/mm.h> #include <xen/vga.h> #include <asm/io.h> - +#include <xen/delay.h> /* Filled in by arch boot code. */ struct xen_vga_console_info vga_console_info; @@ -49,6 +49,8 @@ void (*vga_puts)(const char *) = vga_noo static char __initdata opt_vga[30] = ""; string_param("vga", opt_vga); +static bool_t __read_mostly vga_delay; +boolean_param("vga_delay", vga_delay); /* VGA text-mode definitions. */ static unsigned int columns, lines; #define ATTRIBUTE 7 @@ -128,6 +130,9 @@ static void vga_text_puts(const char *s) while ( (c = *s++) != ''\0'' ) { + if (vga_delay) + udelay(2000); + if ( (c == ''\n'') || (xpos >= columns) ) { if ( ++ypos >= lines )
Konrad Rzeszutek Wilk
2012-Mar-24 17:41 UTC
[PATCH 3 of 6] xen/pat: After suspend re-write PAT if BIOS changed it
# HG changeset patch # User Simon Graham <simon.graham@virtualcomputer.com> # Date 1332610898 14400 # Node ID 75798a472b1a9121adda166b6fd05ba8473a44f0 # Parent d097c3ba42f601af65b53a0c84973855aab64aa9 xen/pat: After suspend re-write PAT if BIOS changed it. Certain AMD machines (this was a MSI or GigaBYTE BIOS) after resume would reset the PAT MSR causing rather weird issues - where the pages would (say they would be set to WC) would end up with the wrong type (as they would use the BIOS PAT instead of the one set by the hypervisor). Signed-off-by: Simon Graham <simon.graham@virtualcomputer.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> diff -r d097c3ba42f6 -r 75798a472b1a xen/arch/x86/acpi/power.c --- a/xen/arch/x86/acpi/power.c Sat Mar 24 12:54:12 2012 -0400 +++ b/xen/arch/x86/acpi/power.c Sat Mar 24 13:41:38 2012 -0400 @@ -41,8 +41,25 @@ static DEFINE_SPINLOCK(pm_lock); struct acpi_sleep_info acpi_sinfo; +static void pat_resume(void); void do_suspend_lowlevel(void); +static void +pat_resume() +{ + u64 pat; + + rdmsrl(MSR_IA32_CR_PAT, pat); + if (pat != host_pat) { + printk(KERN_INFO PREFIX "Found PAT MSR: 0x%lx\n", pat); + printk(KERN_INFO PREFIX "reseting to 0x%lx\n", host_pat); + wrmsrl(MSR_IA32_CR_PAT, host_pat); + rdmsrl(MSR_IA32_CR_PAT, pat); + if (pat != host_pat) + printk(KERN_WARNING PREFIX "PAT MSR stuck on: 0x%lx\n", pat); + } +} + static int device_power_down(void) { console_suspend(); @@ -194,6 +211,7 @@ static int enter_state(u32 state) if ( cpu_has_efer ) write_efer(read_efer()); + pat_resume(); device_power_up(); mcheck_init(&boot_cpu_data, 0);
Konrad Rzeszutek Wilk
2012-Mar-24 17:41 UTC
[PATCH 4 of 6] xend: Don''t crash due to weird PCI devices
# HG changeset patch # User Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> # Date 1332610898 14400 # Node ID d42921da3931026ecf5da7c0e5bb86074e77cf71 # Parent 75798a472b1a9121adda166b6fd05ba8473a44f0 xend: Don''t crash due to weird PCI devices This fixes Red Hat BZ 767742 where a user had some truly weird PCI devices: $ lspci -vvv -xxx -s 0000:01:00.0 01:00.0 VGA compatible controller: nVidia Corporation GT218 [NVS 3100M] (rev ff) (prog-if ff) !!! Unknown header type 7f 00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 10: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 20: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff 30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff And xend would report: ERROR (pci:1272) Caught ''Looped capability chain: 0000:01:00.0'' This fixes it. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> diff -r 75798a472b1a -r d42921da3931 tools/python/xen/util/pci.py --- a/tools/python/xen/util/pci.py Sat Mar 24 13:41:38 2012 -0400 +++ b/tools/python/xen/util/pci.py Sat Mar 24 13:41:38 2012 -0400 @@ -1268,7 +1268,12 @@ class PciDevice: pass def get_info_from_sysfs(self): - self.find_capability(0x11) + try: + self.find_capability(0x11) + except PciDeviceParseError, err: + log.error("Caught ''%s''" % err) + return False + sysfs_mnt = find_sysfs_mnt() if sysfs_mnt == None: return False
Konrad Rzeszutek Wilk
2012-Mar-24 17:41 UTC
[PATCH 5 of 6] xend/xc: Implement a domain_set_e820_hole function to be used by python code
# HG changeset patch # User Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> # Date 1332610898 14400 # Node ID 95eda76084314aa8a5cfd4b5e83969823492deda # Parent d42921da3931026ecf5da7c0e5bb86074e77cf71 xend/xc: Implement a domain_set_e820_hole function to be used by python code Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> diff -r d42921da3931 -r 95eda7608431 tools/python/xen/lowlevel/xc/xc.c --- a/tools/python/xen/lowlevel/xc/xc.c Sat Mar 24 13:41:38 2012 -0400 +++ b/tools/python/xen/lowlevel/xc/xc.c Sat Mar 24 13:41:38 2012 -0400 @@ -16,6 +16,7 @@ #include <sys/mman.h> #include <netdb.h> #include <arpa/inet.h> +#include <stdio.h> #include "xenctrl.h" #include <xen/elfnote.h> @@ -1697,6 +1698,243 @@ static PyObject *pyxc_domain_set_memmap_ return zero; } +#ifdef PRIu64 +static const char *e820_names(int type) +{ + switch (type) { + case E820_RAM: return "RAM"; + case E820_RESERVED: return "Reserved"; + case E820_ACPI: return "ACPI"; + case E820_NVS: return "ACPI NVS"; + case E820_UNUSABLE: return "Unusable"; + default: break; + } + return "Unknown"; +} +#endif +static int e820_sanitize(struct e820entry src[], + uint32_t *nr_entries, + unsigned long map_limitkb, + unsigned long balloon_kb) +{ + uint64_t delta_kb = 0, start = 0, start_kb = 0, last = 0, ram_end; + uint32_t i, idx = 0, nr; + struct e820entry e820[E820MAX]; + + if (!src || !map_limitkb || !balloon_kb || !nr_entries) + return -EINVAL; + + nr = *nr_entries; + if (!nr) + return -EINVAL; + + if (nr > E820MAX) + return -EINVAL; + + /* Weed out anything under 1MB */ + for (i = 0; i < nr; i++) { + if (src[i].addr > 0x100000) + continue; + + src[i].type = 0; + src[i].size = 0; + src[i].addr = -1ULL; + } + + /* Find the lowest and highest entry in E820, skipping over + * undesired entries. */ + start = -1ULL; + last = 0; + for (i = 0; i < nr; i++) { + if ((src[i].type == E820_RAM) || + (src[i].type == E820_UNUSABLE) || + (src[i].type == 0)) + continue; + + start = src[i].addr < start ? src[i].addr : start; + last = src[i].addr + src[i].size > last ? + src[i].addr + src[i].size > last : last; + } + if (start > 1024) + start_kb = start >> 10; + + /* Add the memory RAM region for the guest */ + e820[idx].addr = 0; + e820[idx].size = (uint64_t)map_limitkb << 10; + e820[idx].type = E820_RAM; + + /* .. and trim if neccessary */ + if (start_kb && map_limitkb > start_kb) { + delta_kb = map_limitkb - start_kb; + if (delta_kb) + e820[idx].size -= (uint64_t)(delta_kb << 10); + } + /* Note: We don''t touch balloon_kb here. Will add it at the end. */ + ram_end = e820[idx].addr + e820[idx].size; + idx ++; +#ifdef PRIu64 + printf("Memory: %"PRIu64"kB End of RAM: " \ + "0x%"PRIx64" (PFN) Delta: %"PRIu64"kB, PCI start: %"PRIu64"kB " \ + "(0x%"PRIx64" PFN), Balloon %"PRIu64"kB\n", (uint64_t)map_limitkb, + ram_end >> 12, delta_kb, start_kb ,start >> 12, + (uint64_t)balloon_kb); +#endif + + /* This whole code below is to guard against if the Intel IGD is passed into + * the guest. If we don''t pass in IGD, this whole code can be ignored. + * + * The reason for this code is that Intel boxes fill their E820 with + * E820_RAM amongst E820_RESERVED and we can''t just ditch those E820_RAM. + * That is b/c any "gaps" in the E820 is considered PCI I/O space by + * Linux and it would be utilized by the Intel IGD as I/O space while + * in reality it was an RAM region. + * + * What this means is that we have to walk the E820 and for any region + * that is RAM and below 4GB and above ram_end, needs to change its type + * to E820_UNUSED. We also need to move some of the E820_RAM regions if + * the overlap with ram_end. */ + for (i = 0; i < nr; i++) { + uint64_t end = src[i].addr + src[i].size; + + /* We don''t care about E820_UNUSABLE, but we need to + * change the type to zero b/c the loop after this + * sticks E820_UNUSABLE on the guest''s E820 but ignores + * the ones with type zero. */ + if ((src[i].type == E820_UNUSABLE) || + /* Any region that is within the "RAM region" can + * be safely ditched. */ + (end < ram_end)) { + src[i].type = 0; + continue; + } + + /* Look only at RAM regions. */ + if (src[i].type != E820_RAM) + continue; + + /* We only care about RAM regions below 4GB. */ + if (src[i].addr >= (1ULL<<32)) + continue; + + /* E820_RAM overlaps with our RAM region. Move it */ + if (src[i].addr < ram_end) { + uint64_t delta; + + src[i].type = E820_UNUSABLE; + delta = ram_end - src[i].addr; + /* The end < ram_end should weed this out */ + if (src[i].size - delta < 0) + src[i].type = 0; + else { + src[i].size -= delta; + src[i].addr = ram_end; + } + if (src[i].addr + src[i].size != end) { + /* We messed up somewhere */ + src[i].type = 0; +#ifdef PRIu64 + printf( "Computed E820 wrongly. Continuing on."); +#endif + } + } + /* Lastly, convert the RAM to UNSUABLE. Look in the Linux kernel + at git commit 2f14ddc3a7146ea4cd5a3d1ecd993f85f2e4f948 + "xen/setup: Inhibit resource API from using System RAM E820 + gaps as PCI mem gaps" for full explanation. */ + if (end > ram_end) + src[i].type = E820_UNUSABLE; + } + + /* Check if there is a region between ram_end and start. */ + if (start > ram_end) { + int add_unusable = 1; + for (i = 0; i < nr && add_unusable; i++) { + if (src[i].type != E820_UNUSABLE) + continue; + if (ram_end != src[i].addr) + continue; + if (start != src[i].addr + src[i].size) { + /* there is one, adjust it */ + src[i].size = start - src[i].addr; + } + add_unusable = 0; + } + /* .. and if not present, add it in. This is to guard against + the Linux guest assuming that the gap between the end of + RAM region and the start of the E820_[ACPI,NVS,RESERVED] + is PCI I/O space. Which it certainly is _not_. */ + if (add_unusable) { + e820[idx].type = E820_UNUSABLE; + e820[idx].addr = ram_end; + e820[idx].size = start - ram_end; + idx++; + } + } + /* Almost done: copy them over, ignoring the undesireable ones */ + for (i = 0; i < nr; i++) { + if ((src[i].type == E820_RAM) || + (src[i].type == 0)) + continue; + + e820[idx].type = src[i].type; + e820[idx].addr = src[i].addr; + e820[idx].size = src[i].size; + idx++; + } + /* At this point we have the mapped RAM + E820 entries from src. */ + if (balloon_kb) { + /* and if we truncated the RAM region, then add it to the end. */ + e820[idx].type = E820_RAM; + e820[idx].addr = (uint64_t)(1ULL << 32) > last ? + (uint64_t)(1ULL << 32) : last; + /* also add the balloon memory to the end. */ + e820[idx].size = (uint64_t)(delta_kb << 10) + + (uint64_t)(balloon_kb << 10); + idx++; + + } + nr = idx; +#ifdef PRIu64 + for (i = 0; i < nr; i++) { + printf(":\t[%"PRIx64" -> %"PRIx64"] %s", + e820[i].addr >> 12, (e820[i].addr + e820[i].size) >> 12, + e820_names(e820[i].type)); + } +#endif + /* Done: copy the sanitized version. */ + *nr_entries = nr; + memcpy(src, e820, nr * sizeof(struct e820entry)); + return 0; +} + +static PyObject *pyxc_domain_set_e820_hole(XcObject *self, PyObject *args) +{ + uint32_t dom, nr; + unsigned int target_kb; + unsigned int balloon_kb; + int rc; + struct e820entry map[E820MAX]; + + if ( !PyArg_ParseTuple(args, "iii", &dom, &target_kb, &balloon_kb) ) + return NULL; + + rc = xc_get_machine_memory_map(self->xc_handle, map, E820MAX); + if (rc < 0) + return pyxc_error_to_exception(self->xc_handle); + + nr = rc; + rc = e820_sanitize(map, &nr, target_kb, balloon_kb); + if (rc) + return pyxc_error_to_exception(self->xc_handle); + + rc = xc_domain_set_memory_map(self->xc_handle, dom, map, nr); + if (rc < 0) + return pyxc_error_to_exception(self->xc_handle); + + Py_INCREF(zero); + return zero; +} + static PyObject *pyxc_domain_ioport_permission(XcObject *self, PyObject *args, PyObject *kwds) @@ -2701,6 +2939,15 @@ static PyMethodDef pyxc_methods[] = { " map_limitkb [int]: .\n" "Returns: [int] 0 on success; -1 on error.\n" }, + { "domain_set_e820_hole", + (PyCFunction)pyxc_domain_set_e820_hole, + METH_VARARGS, "\n" + "Set a domain''s E820 memory map\n" + " dom [int]: Identifier of domain.\n" + " target_memkb [int]: .\n" + " balloon_kb [int]: .\n" + "Returns: [int] 0 on success; -1 on error.\n" }, + #ifdef __ia64__ { "nvram_init", (PyCFunction)pyxc_nvram_init,
Konrad Rzeszutek Wilk
2012-Mar-24 17:41 UTC
[PATCH 6 of 6] xend: Add support for passing in the host''s E820 for PCI passthrough
# HG changeset patch # User Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> # Date 1332610898 14400 # Node ID 67f01753ae43f66a1a4b67cbc4952fb716e45c78 # Parent 95eda76084314aa8a5cfd4b5e83969823492deda xend: Add support for passing in the host''s E820 for PCI passthrough The code that populates E820 is unconditionally triggered by the guest configuration having ''e820_hole=1''. xend calls the xc_get_machine_memory_map to retrieve the systems E820. Then the E820 is sanitized to weed out E820 entries below 16MB, and as well remove any E820_RAM or E820_UNUSED regions as the guest does not need to know about them. The guest only needs the E820_ACPI, E820_NVS, E820_RESERVED to get an idea of where the PCI I/O space is. Mostly.. The Linux kernel assumes that any gap in the E820 is considered PCI I/O space which means that if we pass in the guest 2GB, and the E820_ACPI, and its friend start at 3GB, the gap between 2GB and 3GB will be considered as PCI I/O space. To guard against that we also create an E820_UNUSABLE between the region of ''target_kb'' (called ram_end in the code) up to the first E820_[ACPI,NVS,RESERVED] region. Lastly, the xc_domain_set_memory_map is called to install the new E820. When tested with another PV guest (NetBSD 5.1) the modified E820 gave it no trouble. The code has also been tested with older "classic" Xen Linux and with the newer "pvops" with success (SLES11, RHEL5, Ubuntu Lucid, Debian Squeeze, 2.6.37, 2.6.38, 2.6.39). Memory that is slack or for balloon (so ''maxmem'' in guest configuration) is put behind the machine E820. Which in most cases is after the 4GB. The reason for doing the fetching of the E820 using the hypercall in the toolstack (instead of the guest doing it) is that when a guest would do a hypercall to ''XENMEM_machine_memory_map'' it would retrieve an E820 with I/O range caps added in. Meaning that the region after 4GB up to end of possible memory would be marked as unusable and the kernel would not have any space to allocate a balloon region. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> diff -r 95eda7608431 -r 67f01753ae43 tools/python/xen/xend/XendConfig.py --- a/tools/python/xen/xend/XendConfig.py Sat Mar 24 13:41:38 2012 -0400 +++ b/tools/python/xen/xend/XendConfig.py Sat Mar 24 13:41:38 2012 -0400 @@ -241,6 +241,7 @@ XENAPI_CFG_TYPES = { ''machine_address_size'': int, ''suppress_spurious_page_faults'': bool0, ''s3_integrity'' : int, + ''e820_hole'' : int, ''superpages'' : int, ''memory_sharing'': int, ''pool_name'' : str, @@ -423,6 +424,7 @@ class XendConfig(dict): ''target'': 0, ''pool_name'' : ''Pool-0'', ''superpages'': 0, + ''e820_hole'': 0, ''description'': '''', } @@ -512,6 +514,9 @@ class XendConfig(dict): if ''nomigrate'' not in self[''platform'']: self[''platform''][''nomigrate''] = 0 + if ''e820_hole'' not in self[''platform'']: + self[''platform''][''e820_hole''] = 0 + if self.is_hvm(): if ''timer_mode'' not in self[''platform'']: self[''platform''][''timer_mode''] = 1 @@ -539,6 +544,8 @@ class XendConfig(dict): self[''platform''][''loader''] = auxbin.pathTo("hvmloader") if not os.path.exists(self[''platform''][''loader'']): raise VmError("kernel ''%s'' not found" % str(self[''platform''][''loader''])) + if ''e820_hole'' in self[''platform''] == 1: + raise VmError("e820_hole can only be used with PV guests!") # Compatibility hack, can go away soon. if ''soundhw'' not in self[''platform''] and \ @@ -2140,6 +2147,8 @@ class XendConfig(dict): image.append([''args'', self[''PV_args'']]) if self.has_key(''superpages''): image.append([''superpages'', self[''superpages'']]) + if self.has_key(''e820_hole''): + image.append([''e820_hole'', self[''e820_hole'']]) for key in XENAPI_PLATFORM_CFG_TYPES.keys(): if key in self[''platform'']: @@ -2184,6 +2193,10 @@ class XendConfig(dict): val = sxp.child_value(image_sxp, ''superpages'') if val is not None: self[''superpages''] = val + + val = sxp.child_value(image_sxp, ''e820_hole'') + if val is not None: + self[''e820_hole''] = val val = sxp.child_value(image_sxp, ''memory_sharing'') if val is not None: diff -r 95eda7608431 -r 67f01753ae43 tools/python/xen/xend/image.py --- a/tools/python/xen/xend/image.py Sat Mar 24 13:41:38 2012 -0400 +++ b/tools/python/xen/xend/image.py Sat Mar 24 13:41:38 2012 -0400 @@ -705,12 +705,14 @@ class LinuxImageHandler(ImageHandler): ostype = "linux" flags = 0 vhpt = 0 + e820_hole = 0 def configure(self, vmConfig): ImageHandler.configure(self, vmConfig) self.vramsize = int(vmConfig[''platform''].get(''videoram'',4)) * 1024 self.is_stubdom = (self.kernel.find(''stubdom'') >= 0) self.superpages = int(vmConfig[''superpages'']) + self.e820_hole = int(vmConfig[''e820_hole'']) def buildDomain(self): store_evtchn = self.vm.getStorePort() @@ -731,6 +733,7 @@ class LinuxImageHandler(ImageHandler): log.debug("superpages = %d", self.superpages) if arch.type == "ia64": log.debug("vhpt = %d", self.vhpt) + log.debug("e820_hole = %d", self.e820_hole) return xc.linux_build(domid = self.vm.getDomid(), memsize = mem_mb, @@ -1072,7 +1075,12 @@ class X86_Linux_ImageHandler(LinuxImageH # set physical mapping limit # add an 8MB slack to balance backend allocations. mem_kb = self.getRequiredMaximumReservation() + (8 * 1024) - xc.domain_set_memmap_limit(self.vm.getDomid(), mem_kb) + if self.e820_hole: + mem_kb = self.getRequiredMaximumReservation(); + balloon_kb = 8 * 1024; + xc.domain_set_e820_hole(self.vm.getDomid(), mem_kb, balloon_kb); + else: + xc.domain_set_memmap_limit(self.vm.getDomid(), mem_kb) rc = LinuxImageHandler.buildDomain(self) self.setCpuid() return rc diff -r 95eda7608431 -r 67f01753ae43 tools/python/xen/xm/create.py --- a/tools/python/xen/xm/create.py Sat Mar 24 13:41:38 2012 -0400 +++ b/tools/python/xen/xm/create.py Sat Mar 24 13:41:38 2012 -0400 @@ -349,6 +349,12 @@ gopts.var(''pci'', val=''BUS:DEV.FUNC[@VSLO If power_mgmt is set, the guest OS will be able to program the power states D0-D3hot of the device, HVM only. Default=0.""") +gopts.var(''e820_hole'', val=''no|yes'', + fn=set_int, default=0, + use="""Expose hosts'' E820 map to PV guest? + (Default is 0).""") + + gopts.var(''vscsi'', val=''PDEV,VDEV[,DOM]'', fn=append_value, default=[], use="""Add a SCSI device to a domain. The physical device is PDEV, @@ -1143,7 +1149,7 @@ def make_config(vals): ''on_reboot'', ''on_crash'', ''features'', ''on_xend_start'', ''on_xend_stop'', ''target'', ''cpuid'', ''cpuid_check'', ''machine_address_size'', ''suppress_spurious_page_faults'', - ''description'']) + ''description'', ''e820_hole'']) vcpu_conf() if vals.uuid is not None: diff -r 95eda7608431 -r 67f01753ae43 tools/python/xen/xm/xenapi_create.py --- a/tools/python/xen/xm/xenapi_create.py Sat Mar 24 13:41:38 2012 -0400 +++ b/tools/python/xen/xm/xenapi_create.py Sat Mar 24 13:41:38 2012 -0400 @@ -285,6 +285,8 @@ class xenapi_create: vm.attributes["s3_integrity"].value, "superpages": vm.attributes["superpages"].value, + "e820_hole": + vm.attributes["e820_hole"].value, "memory_static_max": get_child_node_attribute(vm, "memory", "static_max"), "memory_static_min": @@ -697,6 +699,8 @@ class sxp2xml: = str(get_child_by_name(config, "s3_integrity", 0)) vm.attributes["superpages"] \ = str(get_child_by_name(config, "superpages", 0)) + vm.attributes["e820_hole"] \ + = str(get_child_by_name(config, "e820_hole", 0)) vm.attributes["pool_name"] \ = str(get_child_by_name(config, "pool_name", "Pool-0")) @@ -1111,7 +1115,8 @@ class sxp2xml: ''pci_msitranslate'', ''pci_power_mgmt'', ''xen_platform_pci'', - ''tsc_mode'' + ''tsc_mode'', + ''e820_hole'', ''description'', ''nomigrate'' ]
Jan Beulich
2012-Mar-26 08:43 UTC
Re: [PATCH 2 of 6] xen/vga: Add ''vga_delay'' parameter to delay screen output by 2 second per screen output
>>> On 24.03.12 at 18:41, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > # HG changeset patch > # User Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > # Date 1332608052 14400 > # Node ID d097c3ba42f601af65b53a0c84973855aab64aa9 > # Parent 708bf485e5d4baaf1c6272bd4fb1895a3c355d3e > xen/vga: Add ''vga_delay'' parameter to delay screen output by 2 second per > screen output. > > This is useful if you find yourself on machine that has no serial console, > nor any PCI, PCIe to put in a serial card. Nothing really fancy except it > allows > to capture the screenshot of the screen using a camera. > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > diff -r 708bf485e5d4 -r d097c3ba42f6 xen/drivers/video/vga.c > --- a/xen/drivers/video/vga.c Sat Mar 24 12:54:11 2012 -0400 > +++ b/xen/drivers/video/vga.c Sat Mar 24 12:54:12 2012 -0400 > @@ -10,7 +10,7 @@ > #include <xen/mm.h> > #include <xen/vga.h> > #include <asm/io.h> > - > +#include <xen/delay.h> > /* Filled in by arch boot code. */ > struct xen_vga_console_info vga_console_info; > > @@ -49,6 +49,8 @@ void (*vga_puts)(const char *) = vga_noo > static char __initdata opt_vga[30] = ""; > string_param("vga", opt_vga); > > +static bool_t __read_mostly vga_delay; > +boolean_param("vga_delay", vga_delay); > /* VGA text-mode definitions. */ > static unsigned int columns, lines; > #define ATTRIBUTE 7 > @@ -128,6 +130,9 @@ static void vga_text_puts(const char *s) > > while ( (c = *s++) != ''\0'' ) > { > + if (vga_delay) > + udelay(2000);Subject and implementation aren''t in sync - udelay()''s argument is in microseconds, so the delay above is 2ms. I wonder whether that''s really useful for taking photos of the screen. Oh, wait, this is per character, not per line. The description should probably warn about this being potentially unsafe. And if any such would really be allowed to go in, I''d favor the command line option to specify a numeric value (e.g. in milliseconds) rather than forcing a fixed value of 2ms. Jan> + > if ( (c == ''\n'') || (xpos >= columns) ) > { > if ( ++ypos >= lines ) > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jan Beulich
2012-Mar-26 08:50 UTC
Re: [PATCH 3 of 6] xen/pat: After suspend re-write PAT if BIOS changed it
>>> On 24.03.12 at 18:41, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > # HG changeset patch > # User Simon Graham <simon.graham@virtualcomputer.com> > # Date 1332610898 14400 > # Node ID 75798a472b1a9121adda166b6fd05ba8473a44f0 > # Parent d097c3ba42f601af65b53a0c84973855aab64aa9 > xen/pat: After suspend re-write PAT if BIOS changed it. > > Certain AMD machines (this was a MSI or GigaBYTE BIOS) after resume > would reset the PAT MSR causing rather weird issues - where > the pages would (say they would be set to WC) would end up with the > wrong type (as they would use the BIOS PAT instead of the one set by > the hypervisor).There''s a write of the PAT MSR already at the end of restore_rest_processor_state() - are you saying this doesn''t do what is needed? Also note that this is properly gated by a check of cpu_has_pat (other than the patch here does).> Signed-off-by: Simon Graham <simon.graham@virtualcomputer.com> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > diff -r d097c3ba42f6 -r 75798a472b1a xen/arch/x86/acpi/power.c > --- a/xen/arch/x86/acpi/power.c Sat Mar 24 12:54:12 2012 -0400 > +++ b/xen/arch/x86/acpi/power.c Sat Mar 24 13:41:38 2012 -0400 > @@ -41,8 +41,25 @@ static DEFINE_SPINLOCK(pm_lock); > > struct acpi_sleep_info acpi_sinfo; > > +static void pat_resume(void); > void do_suspend_lowlevel(void); > > +static void > +pat_resume() > +{ > + u64 pat; > + > + rdmsrl(MSR_IA32_CR_PAT, pat); > + if (pat != host_pat) { > + printk(KERN_INFO PREFIX "Found PAT MSR: 0x%lx\n", pat); > + printk(KERN_INFO PREFIX "reseting to 0x%lx\n", host_pat); > + wrmsrl(MSR_IA32_CR_PAT, host_pat); > + rdmsrl(MSR_IA32_CR_PAT, pat); > + if (pat != host_pat) > + printk(KERN_WARNING PREFIX "PAT MSR stuck on: 0x%lx\n", pat);All the %lx format specifiers here would break the 32-bit build afaict. Further (if this code really is needed at all), please use %# instead of 0x%. Jan> + } > +} > + > static int device_power_down(void) > { > console_suspend(); > @@ -194,6 +211,7 @@ static int enter_state(u32 state) > if ( cpu_has_efer ) > write_efer(read_efer()); > > + pat_resume(); > device_power_up(); > > mcheck_init(&boot_cpu_data, 0); > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2012-Mar-26 09:42 UTC
Re: [PATCH 5 of 6] xend/xc: Implement a domain_set_e820_hole function to be used by python code
On Sat, 2012-03-24 at 17:41 +0000, Konrad Rzeszutek Wilk wrote:> # HG changeset patch > # User Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > # Date 1332610898 14400 > # Node ID 95eda76084314aa8a5cfd4b5e83969823492deda > # Parent d42921da3931026ecf5da7c0e5bb86074e77cf71 > xend/xc: Implement a domain_set_e820_hole function to be used by python code > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > diff -r d42921da3931 -r 95eda7608431 tools/python/xen/lowlevel/xc/xc.c > --- a/tools/python/xen/lowlevel/xc/xc.c Sat Mar 24 13:41:38 2012 -0400 > +++ b/tools/python/xen/lowlevel/xc/xc.c Sat Mar 24 13:41:38 2012 -0400 > @@ -16,6 +16,7 @@ > #include <sys/mman.h> > #include <netdb.h> > #include <arpa/inet.h> > +#include <stdio.h> > > #include "xenctrl.h" > #include <xen/elfnote.h> > @@ -1697,6 +1698,243 @@ static PyObject *pyxc_domain_set_memmap_ > return zero; > } > > +#ifdef PRIu64This is a really weird condition -- when / where does this end up not defined? Perhaps you see this depending on platform? In which case is this just a case of including the right header (stdint.h?) directly instead of implicitly via some arch dependent chain of includes?> +static const char *e820_names(int type) > +{ > + switch (type) { > + case E820_RAM: return "RAM"; > + case E820_RESERVED: return "Reserved"; > + case E820_ACPI: return "ACPI"; > + case E820_NVS: return "ACPI NVS"; > + case E820_UNUSABLE: return "Unusable"; > + default: break; > + } > + return "Unknown"; > +} > +#endif > +static int e820_sanitize(struct e820entry src[], > + uint32_t *nr_entries, > + unsigned long map_limitkb, > + unsigned long balloon_kb) > +{Seems odd to do this in the C bindings, can this be done either in the python layer or in the libxc layer (in which case libxl can use it too?) [...]> +} > + > +static PyObject *pyxc_domain_set_e820_hole(XcObject *self, PyObject *args) > +{ > + uint32_t dom, nr; > + unsigned int target_kb; > + unsigned int balloon_kb; > + int rc; > + struct e820entry map[E820MAX]; > + > + if ( !PyArg_ParseTuple(args, "iii", &dom, &target_kb, &balloon_kb) ) > + return NULL; > + > + rc = xc_get_machine_memory_map(self->xc_handle, map, E820MAX); > + if (rc < 0) > + return pyxc_error_to_exception(self->xc_handle); > + > + nr = rc; > + rc = e820_sanitize(map, &nr, target_kb, balloon_kb); > + if (rc) > + return pyxc_error_to_exception(self->xc_handle); > + > + rc = xc_domain_set_memory_map(self->xc_handle, dom, map, nr); > + if (rc < 0) > + return pyxc_error_to_exception(self->xc_handle); > + > + Py_INCREF(zero); > + return zero; > +} > + > static PyObject *pyxc_domain_ioport_permission(XcObject *self, > PyObject *args, > PyObject *kwds) > @@ -2701,6 +2939,15 @@ static PyMethodDef pyxc_methods[] = { > " map_limitkb [int]: .\n" > "Returns: [int] 0 on success; -1 on error.\n" }, > > + { "domain_set_e820_hole", > + (PyCFunction)pyxc_domain_set_e820_hole, > + METH_VARARGS, "\n" > + "Set a domain''s E820 memory map\n" > + " dom [int]: Identifier of domain.\n" > + " target_memkb [int]: .\n" > + " balloon_kb [int]: .\n" > + "Returns: [int] 0 on success; -1 on error.\n" }, > + > #ifdef __ia64__ > { "nvram_init", > (PyCFunction)pyxc_nvram_init, > >
Ian Jackson
2012-Mar-27 13:39 UTC
Re: [PATCH 4 of 6] xend: Don''t crash due to weird PCI devices
Konrad Rzeszutek Wilk writes ("[PATCH 4 of 6] xend: Don''t crash due to weird PCI devices"):> xend: Don''t crash due to weird PCI devicesThis seems plausible. Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson
2012-Mar-27 13:39 UTC
Re: [PATCH 1 of 6] linux-xencommons: Load xen-acpi-processor
Konrad Rzeszutek Wilk writes ("[PATCH 1 of 6] linux-xencommons: Load xen-acpi-processor"):> linux-xencommons: Load xen-acpi-processorAcked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson
2012-Mar-27 13:41 UTC
Re: [PATCH 2 of 6] xen/vga: Add ''vga_delay'' parameter to delay screen output by 2 second per screen output
Konrad Rzeszutek Wilk writes ("[PATCH 2 of 6] xen/vga: Add ''vga_delay'' parameter to delay screen output by 2 second per screen output"):> xen/vga: Add ''vga_delay'' parameter to delay screen output by 2 second per screen output.This seems to be 2 seconds per character, which is really quite slow. At the very least, couldn''t it be a variable delay value defaulting to 0 ? Ian.
Ian Jackson
2012-Mar-27 13:43 UTC
Re: [PATCH 6 of 6] xend: Add support for passing in the host''s E820 for PCI passthrough
Konrad Rzeszutek Wilk writes ("[PATCH 6 of 6] xend: Add support for passing in the host''s E820 for PCI passthrough"):> xend: Add support for passing in the host''s E820 for PCI passthrough...>> The code that populates E820 is unconditionally triggered by the guest > configuration having ''e820_hole=1''.Isn''t this similar to some xl/libxl features ? "e820_host" maybe. We certainly don''t want to add any new features to xend with incompatible config settings to xl. Ian.
Konrad Rzeszutek Wilk
2012-Apr-02 15:28 UTC
Re: [PATCH 3 of 6] xen/pat: After suspend re-write PAT if BIOS changed it
On Mon, Mar 26, 2012 at 09:50:11AM +0100, Jan Beulich wrote:> >>> On 24.03.12 at 18:41, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > > # HG changeset patch > > # User Simon Graham <simon.graham@virtualcomputer.com> > > # Date 1332610898 14400 > > # Node ID 75798a472b1a9121adda166b6fd05ba8473a44f0 > > # Parent d097c3ba42f601af65b53a0c84973855aab64aa9 > > xen/pat: After suspend re-write PAT if BIOS changed it. > > > > Certain AMD machines (this was a MSI or GigaBYTE BIOS) after resume > > would reset the PAT MSR causing rather weird issues - where > > the pages would (say they would be set to WC) would end up with the > > wrong type (as they would use the BIOS PAT instead of the one set by > > the hypervisor). > > There''s a write of the PAT MSR already at the end of > restore_rest_processor_state() - are you saying this doesn''t do > what is needed? Also note that this is properly gated by a check > of cpu_has_pat (other than the patch here does).Let me double-check with folks at VirtualComputer - but they had experienced this with Xen 4.0 (I think) and the c/s 19167 certainly was in there.> > > Signed-off-by: Simon Graham <simon.graham@virtualcomputer.com> > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > > > diff -r d097c3ba42f6 -r 75798a472b1a xen/arch/x86/acpi/power.c > > --- a/xen/arch/x86/acpi/power.c Sat Mar 24 12:54:12 2012 -0400 > > +++ b/xen/arch/x86/acpi/power.c Sat Mar 24 13:41:38 2012 -0400 > > @@ -41,8 +41,25 @@ static DEFINE_SPINLOCK(pm_lock); > > > > struct acpi_sleep_info acpi_sinfo; > > > > +static void pat_resume(void); > > void do_suspend_lowlevel(void); > > > > +static void > > +pat_resume() > > +{ > > + u64 pat; > > + > > + rdmsrl(MSR_IA32_CR_PAT, pat); > > + if (pat != host_pat) { > > + printk(KERN_INFO PREFIX "Found PAT MSR: 0x%lx\n", pat); > > + printk(KERN_INFO PREFIX "reseting to 0x%lx\n", host_pat); > > + wrmsrl(MSR_IA32_CR_PAT, host_pat); > > + rdmsrl(MSR_IA32_CR_PAT, pat); > > + if (pat != host_pat) > > + printk(KERN_WARNING PREFIX "PAT MSR stuck on: 0x%lx\n", pat); > > All the %lx format specifiers here would break the 32-bit build afaict. > Further (if this code really is needed at all), please use %# instead > of 0x%.Right.
Konrad Rzeszutek Wilk
2012-Apr-02 15:59 UTC
Re: [PATCH 5 of 6] xend/xc: Implement a domain_set_e820_hole function to be used by python code
On Mon, Mar 26, 2012 at 10:42:59AM +0100, Ian Campbell wrote:> On Sat, 2012-03-24 at 17:41 +0000, Konrad Rzeszutek Wilk wrote: > > # HG changeset patch > > # User Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > # Date 1332610898 14400 > > # Node ID 95eda76084314aa8a5cfd4b5e83969823492deda > > # Parent d42921da3931026ecf5da7c0e5bb86074e77cf71 > > xend/xc: Implement a domain_set_e820_hole function to be used by python code > > > > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > > > diff -r d42921da3931 -r 95eda7608431 tools/python/xen/lowlevel/xc/xc.c > > --- a/tools/python/xen/lowlevel/xc/xc.c Sat Mar 24 13:41:38 2012 -0400 > > +++ b/tools/python/xen/lowlevel/xc/xc.c Sat Mar 24 13:41:38 2012 -0400 > > @@ -16,6 +16,7 @@ > > #include <sys/mman.h> > > #include <netdb.h> > > #include <arpa/inet.h> > > +#include <stdio.h> > > > > #include "xenctrl.h" > > #include <xen/elfnote.h> > > @@ -1697,6 +1698,243 @@ static PyObject *pyxc_domain_set_memmap_ > > return zero; > > } > > > > +#ifdef PRIu64 > > This is a really weird condition -- when / where does this end up not > defined?32-bit, but not sure now if the problem still exists.> > Perhaps you see this depending on platform? In which case is this just a > case of including the right header (stdint.h?) directly instead of > implicitly via some arch dependent chain of includes?<nods> That is probably what I hit and never tried to resolve.> > > +static const char *e820_names(int type) > > +{ > > + switch (type) { > > + case E820_RAM: return "RAM"; > > + case E820_RESERVED: return "Reserved"; > > + case E820_ACPI: return "ACPI"; > > + case E820_NVS: return "ACPI NVS"; > > + case E820_UNUSABLE: return "Unusable"; > > + default: break; > > + } > > + return "Unknown"; > > +} > > +#endif > > +static int e820_sanitize(struct e820entry src[], > > + uint32_t *nr_entries, > > + unsigned long map_limitkb, > > + unsigned long balloon_kb) > > +{ > > Seems odd to do this in the C bindings, can this be done either in the > python layer or in the libxc layer (in which case libxl can use it too?)So this is copied from the libxl layer (With the removal of the libxl_ctx). I was hoping you could shed some ideas of how to "export" that function (e820_sanitize) from the libxl_pci.c so that the tools/python/xen/lowlevel/xc/xc.c can also use it?
Konrad Rzeszutek Wilk
2012-Apr-02 16:00 UTC
Re: [PATCH 6 of 6] xend: Add support for passing in the host''s E820 for PCI passthrough
On Tue, Mar 27, 2012 at 02:43:56PM +0100, Ian Jackson wrote:> Konrad Rzeszutek Wilk writes ("[PATCH 6 of 6] xend: Add support for passing in the host''s E820 for PCI passthrough"): > > xend: Add support for passing in the host''s E820 for PCI passthrough > ...> > > The code that populates E820 is unconditionally triggered by the guest > > configuration having ''e820_hole=1''. > > Isn''t this similar to some xl/libxl features ? "e820_host" maybe.Yes. It is called ''e820_host''. Sorry about that - I had used an earlier version of the patches to make this work on the python stack.> > We certainly don''t want to add any new features to xend with > incompatible config settings to xl.Of course not.> > Ian. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Ian Campbell
2012-Apr-02 16:02 UTC
Re: [PATCH 5 of 6] xend/xc: Implement a domain_set_e820_hole function to be used by python code
On Mon, 2012-04-02 at 16:59 +0100, Konrad Rzeszutek Wilk wrote:> On Mon, Mar 26, 2012 at 10:42:59AM +0100, Ian Campbell wrote: > > > > > +static const char *e820_names(int type) > > > +{ > > > + switch (type) { > > > + case E820_RAM: return "RAM"; > > > + case E820_RESERVED: return "Reserved"; > > > + case E820_ACPI: return "ACPI"; > > > + case E820_NVS: return "ACPI NVS"; > > > + case E820_UNUSABLE: return "Unusable"; > > > + default: break; > > > + } > > > + return "Unknown"; > > > +} > > > +#endif > > > +static int e820_sanitize(struct e820entry src[], > > > + uint32_t *nr_entries, > > > + unsigned long map_limitkb, > > > + unsigned long balloon_kb) > > > +{ > > > > Seems odd to do this in the C bindings, can this be done either in the > > python layer or in the libxc layer (in which case libxl can use it too?) > > So this is copied from the libxl layer (With the removal of the > libxl_ctx). I was hoping you could shed some ideas of how to "export" that > function (e820_sanitize) from the libxl_pci.c so that the > tools/python/xen/lowlevel/xc/xc.c can also use it?The only way would be to move this functionality into libxc instead of libxl. Ian.
Tom Goetz
2012-Apr-03 13:03 UTC
Re: [PATCH 3 of 6] xen/pat: After suspend re-write PAT if BIOS changed it
On Apr 2, 2012, at 11:28 AM, Konrad Rzeszutek Wilk wrote:> On Mon, Mar 26, 2012 at 09:50:11AM +0100, Jan Beulich wrote: >>>>> On 24.03.12 at 18:41, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: >>> # HG changeset patch >>> # User Simon Graham <simon.graham@virtualcomputer.com> >>> # Date 1332610898 14400 >>> # Node ID 75798a472b1a9121adda166b6fd05ba8473a44f0 >>> # Parent d097c3ba42f601af65b53a0c84973855aab64aa9 >>> xen/pat: After suspend re-write PAT if BIOS changed it. >>> >>> Certain AMD machines (this was a MSI or GigaBYTE BIOS) after resume >>> would reset the PAT MSR causing rather weird issues - where >>> the pages would (say they would be set to WC) would end up with the >>> wrong type (as they would use the BIOS PAT instead of the one set by >>> the hypervisor). >> >> There''s a write of the PAT MSR already at the end of >> restore_rest_processor_state() - are you saying this doesn''t do >> what is needed? Also note that this is properly gated by a check >> of cpu_has_pat (other than the patch here does). > > Let me double-check with folks at VirtualComputer - but they had > experienced this with Xen 4.0 (I think) and the c/s 19167 certainly > was in there.We''ve been carrying this patch since before Xen 3.4 days. Given Jan''s information we will remove it, retest, and get back to you.