George Dunlap
2013-Jun-21 16:08 UTC
[PATCH v5 0/8] Relocate devices rather than memory for qemu-xen
This is the fifth version of a patch series to address the issue of qemu-xen not being able to handle moving guest memory in order to resize the lowmem MMIO hole. The only changes from v4 have to do with adding and correcting comments. A brief summary can be seen below: A 1/8 hvmloader: Remove all 64-bit print arguments A 2/8 hvmloader: Make the printfs more informative A 3/8 hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G A 4/8 hvmloader: Fix check for needing a 64-bit bar A- 5/8 hvmloader: Correct bug in low mmio region accounting A 6/8 hvmloader: Load large devices into high MMIO space as needed A 7/8 hvmloader: Remove minimum size for BARs to relocate to 64-bit space a- 8/8 libxl,hvmloader: Don''t relocate memory for MMIO hole Key -: Changes in v5 a: Reviewed / Acked by 1 person A: Reviewed / Acked by 2 or more people = The situation = The default MMIO hole for Xen systems starts at 0xf0000000; this leaves just under 256MiB of space for PCI devices. At the moment, hvmloader will scan the pci bus for devices and resize this hole, up to 2GiB (i.e., starting at 0x80000000) to make space, relocating any overlapping guest memory above 4GiB (0x100000000). (After that point, if there is still not enough space, the intention seemed to be that it would begin mapping devices with 64-bit-capabile BARs into high memory as well, just above the end of RAM; however, there seems to be a bug in the code which detects this condition; it is likely that the 64-bit remapping code was never capable of being triggered.) We expect the default MMIO hole to be insufficient only when passing through devices to guests. This works fine for qemu-traditional, but qemu-xen unfortunately has expectations of where guest memory will be, and will get confused if it moves. If hvmloader does relocate guest RAM, then at some point qemu will try to map that pfn space, resulting in a seg fault and qemu crashing. hvmloader of course will only move RAM if it would overlap the MMIO region; this means that if the guest has a small enough amount of RAM -- say, 2GiB -- then this "memory move" condition will also never be triggered. So at the moment, then under the following conditions: * A user is passing through a device or set of devices requiring more MMIO space than the default MMIO hole * The user has enough memory that resizing the hole will overlap guest memory * The user is using qemu-xen (not qemu-traditoinal) then the user will shortly after boot experience qemu crashing. = The proposed fix = This patch series makes the following functional changes: The core change is this: * When running qemu-xen, don''t resize the MMIO hole; instead rely on devices being moved into the 64-bit MMIO region. In order to make this more effective, we also make the following changes to the 64-bit relocation code: * Allow devices smaller than 512MiB to be relocated to the high MMIO region. * When moving devices into the 64-bit MMIO region, start with the ones with the largest BARs, and only relocate them if there is not enough space for all the remaining BARs = Risk analysis There are two kinds of risks: risks due to unintended changes (i.e., a bug in the patch itself), and risks due to intended changes. We hope that we can solve the first by a combination of testing and code review. The rest of this analysis will assume that the patch is correct, and will try to do a risk analysis on the effects of the patch. The main risk is that moving some devices into 64-bit memory will cause problems with the operation of those devices. Relocating a device may have the following outcomes: 1. In the best case, the relocated device will Just Work. 2. A relocated device may fail in a way that leaves the OS intact: the guest OS may not be able to see them, or the driver may not load. 3. A relocated device may fail in a way that crashes the guest OS: the driver may crash, or one of the relocated devices which fails may be system-critical. 4. A relocated device may fail in a way which is unpredictable, but does not cause data loss: crashing the guest randomly at some point in the future, or causing strange quirks in functionality (e.g., network connectivity dropping, glitches when watching video). 5. A relocated device may fail in a way that is unpredictable, and corrupts data. Outcomes 1-3 are equivalent or strictly better than crashing within a few minutes of boot. Outcome 4 is arguably also not much worse. The main risk to our users would be #5. However: - This is definitely a bug in the driver, OS, or the hardware - This is a bug that might be seen running on real hardware, or in KVM (or some other hypervisor) - This is not a bug that we would be likely to catch, even if we had a full development cycle worth of testing. I think we should therefore not worry about #5, and consider in general that relocating a device into 64-bit space will be no worse, and potentially better, than crashing within a few minutes of boot. There is another risk with this method, which is that a user may end up passing through a number of devices with NON-64-bit BARs such that the devices cannot all fit in the default lowmem MMIO region, but also cannot be remapped above the 64-bit region. If this is the case, then some devices will simply not be able to be mapped. If these non-mapped devices are system critical, the VM will not boot; if they are not, then the devices will simply be invisible. Both of these are either no worse than, and potentially better than, crashing within a few minutes of boot. Starting with all VMs: Any VM running in PV mode will be unaffected. Any VM running in HVM mode but not passing through devices will be unaffected. Any VM running in HVM mode and passing through devices that fit inside the default MMIO space will be unaffected. Any VM running in HVM mode, and passing through devices that require less than 2GiB of MMIO space, *and* having a low enough guest memory that the MMIO hole can be enlarged without moving guest memory, will be unaffected. (For example, if you need 512MiB and you have <3584 MiB of guest RAM; or if you need 1024MiB and have <3072 MiB of guest RAM.) Any VM running in HVM mode, is passing through devices requiring less than 2GiB of MMIO space, and is using qemu-traditional will be unaffected. For a VM running in HVM mode, passing through devices which require more than 2GiB of MMIO space, and using qemu-traditional, and having more than 2GiB of guest memory: * We believe that at the moment what will happen is that because of a bug in hvmloader (fixed in this series), no devices will be mapped in 64-bit space; instead, the smallest devices will simply not be mapped. This will likely cause critical platform devices not to be mapped, causing the VM not to be able to boot. * With this patch, the largest devices *will* be remapped into 64-bit space. * If we are right that the current code will fail, this is a uniform improvement, even if the devices don''t work. * If the current code would work, then a different set of devices will be re-mapped to high memory. This may change some configurations from "works" into "doesn''t work". I think this is a small enough contingent of users, that this is an acceptable amount of risk to take. We have now covered all configurations of qemu-traditional. Since xend only knows how to use qemu-traditional, this also covers all configurations using xend. For VMs running in HVM mode, using qemu-xen, but not using libxl, this patch will have no effect: qemu-xen will crash. NB that this cannot include xend, as it only knows how to drive qemu-traditional. This can be worked around by using qemu-traditional instead, or by setting the appropriate xenstore key on boot. This is acceptable, because this is not really a supported configuration; users should use one of the supported toolstacks, or use libxl. We have now covered all users of any non-libxl-based toolstack. For VM running in HVM mode, using qemu-xen, using libxl, and passing through devices such that the required 32-bit only MMIO space does not fit in the default MMIO hole, and with enough memory that resizing the MMIO hole requires moving guest RAM: * At the moment, hvmloader will relocate guest memory. This will cause qemu-xen to crash within a few minutes. * With this change, the devices with the smallest BARs will simply not be mapped. If these devices are non-critical, they will simply be invisible to the OS; if these devices are critical, the OS will not boot. Crashing immediately or having non-visible devices are the same or better than crashing a few minutes into boot, so this is an improvement (or at least not a regression). For a VM running in HVM mode, using qemu-xen, using libxl, having a required 32-bit only MMIO space that does fit within the default MMIO hole, but a total MMIO space that does not, and having enough memory that resizing the MMIO hole requires moving guest RAM: * At the moment, hvmloader will relocate memory. This will cause qemu-xen to crash within a few minutes of booting. Note that this is true whether the total MMIO space is less than 2GiB or more. * With this change, devices with the largest BARs will be relocated to 64-bit space. We expect that in general, the devices thus relocated will be the passed-through PCI devices. We have decided already to consider any outcome of mapping a device into a 64-bit address space to be no worse than, and potentially better than, qemu-xen crashing; so this can be considered an improvement. We have now covered all possible configurations. In summary: * The vast majority of configurations are unaffected * For those that are affected, the vast majority are either a strict improvement, or no worse than, the status quo. * There is a slight possibility that in one extreme corner case (using qemu-traditional with >2GiB of MMIO space), we may possibly be changing "works" into "fails". I think this is an acceptable risk. Therefore, I think the risks posed by this change are acceptable. CC: George Dunlap <george.dunlap@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Ian Jackson <ian.jackson@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org>
George Dunlap
2013-Jun-21 16:08 UTC
[PATCH v5 1/8] hvmloader: Remove all 64-bit print arguments
The printf() available to hvmloader does not handle 64-bit data types; manually break them down as two 32-bit strings. v4: - Make macros for the requisite format and bit shifting Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Ian Jackson <ian.jackson@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org> --- tools/firmware/hvmloader/pci.c | 11 +++++++---- tools/firmware/hvmloader/util.h | 2 ++ 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c index c78d4d3..c1cb1e9 100644 --- a/tools/firmware/hvmloader/pci.c +++ b/tools/firmware/hvmloader/pci.c @@ -290,8 +290,9 @@ void pci_setup(void) if ( (base < resource->base) || (base > resource->max) ) { - printf("pci dev %02x:%x bar %02x size %llx: no space for " - "resource!\n", devfn>>3, devfn&7, bar_reg, bar_sz); + printf("pci dev %02x:%x bar %02x size "PRIllx": no space for " + "resource!\n", devfn>>3, devfn&7, bar_reg, + PRIllx_arg(bar_sz)); continue; } @@ -300,8 +301,10 @@ void pci_setup(void) pci_writel(devfn, bar_reg, bar_data); if (using_64bar) pci_writel(devfn, bar_reg + 4, bar_data_upper); - printf("pci dev %02x:%x bar %02x size %llx: %08x\n", - devfn>>3, devfn&7, bar_reg, bar_sz, bar_data); + printf("pci dev %02x:%x bar %02x size "PRIllx": %08x\n", + devfn>>3, devfn&7, bar_reg, + PRIllx_arg(bar_sz), + bar_data); /* Now enable the memory or I/O mapping. */ diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h index 7913259..9ccb905 100644 --- a/tools/firmware/hvmloader/util.h +++ b/tools/firmware/hvmloader/util.h @@ -168,6 +168,8 @@ void byte_to_hex(char *digits, uint8_t byte); void uuid_to_string(char *dest, uint8_t *uuid); /* Debug output */ +#define PRIllx "%x%08x" +#define PRIllx_arg(ll) (uint32_t)((ll)>>32), (uint32_t)(ll) int printf(const char *fmt, ...) __attribute__ ((format (printf, 1, 2))); int vprintf(const char *fmt, va_list ap); -- 1.7.9.5
George Dunlap
2013-Jun-21 16:08 UTC
[PATCH v5 2/8] hvmloader: Make the printfs more informative
* Warn that you''re relocating some BARs to 64-bit * Warn that you''re relocating guest pages, and how many * Include upper 32-bits of the base register when printing the bar placement info v4: - Move message about relocating guest pages into loop, include number of pages and guest paddr - Fixed minor brace style issue Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Ian Jackson <ian.jackson@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org> --- tools/firmware/hvmloader/pci.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c index c1cb1e9..44168e2 100644 --- a/tools/firmware/hvmloader/pci.c +++ b/tools/firmware/hvmloader/pci.c @@ -214,7 +214,11 @@ void pci_setup(void) pci_mem_start <<= 1; if ( (pci_mem_start << 1) != 0 ) + { + printf("Low MMIO hole not large enough for all devices," + " relocating some BARs to 64-bit\n"); bar64_relocate = 1; + } /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */ while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend ) @@ -227,6 +231,11 @@ void pci_setup(void) if ( hvm_info->high_mem_pgend == 0 ) hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT); hvm_info->low_mem_pgend -= nr_pages; + printf("Relocating 0x%x pages from "PRIllx" to "PRIllx\ + " for lowmem MMIO hole\n", + nr_pages, + PRIllx_arg(((uint64_t)hvm_info->low_mem_pgend)<<PAGE_SHIFT), + PRIllx_arg(((uint64_t)hvm_info->high_mem_pgend)<<PAGE_SHIFT)); xatp.domid = DOMID_SELF; xatp.space = XENMAPSPACE_gmfn_range; xatp.idx = hvm_info->low_mem_pgend; @@ -301,10 +310,10 @@ void pci_setup(void) pci_writel(devfn, bar_reg, bar_data); if (using_64bar) pci_writel(devfn, bar_reg + 4, bar_data_upper); - printf("pci dev %02x:%x bar %02x size "PRIllx": %08x\n", + printf("pci dev %02x:%x bar %02x size "PRIllx": %x%08x\n", devfn>>3, devfn&7, bar_reg, PRIllx_arg(bar_sz), - bar_data); + bar_data_upper, bar_data); /* Now enable the memory or I/O mapping. */ -- 1.7.9.5
George Dunlap
2013-Jun-21 16:08 UTC
[PATCH v5 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G
hvmloader will read hvm_info->high_mem_pgend to calculate where to start the highmem PCI region. However, if the guest does not have any memory in the high region, this is set to zero, which will cause hvmloader to use the "0" for the base of the highmem region, rather than 1 << 32. Check to see whether hvm_info->high_mem_pgend is set; if so, do the normal calculation; otherwise, use 1<<32. v4: - Handle case where hfm_info->high_mem_pgend is non-zero but doesn''t point into high memory, throwing a warning. Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Ian Jackson <ian.jackson@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org> --- tools/firmware/hvmloader/pci.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c index 44168e2..a3d03ed 100644 --- a/tools/firmware/hvmloader/pci.c +++ b/tools/firmware/hvmloader/pci.c @@ -246,7 +246,18 @@ void pci_setup(void) hvm_info->high_mem_pgend += nr_pages; } - high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend) << PAGE_SHIFT; + high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend) << PAGE_SHIFT; + if ( high_mem_resource.base < 1ull << 32 ) + { + if ( hvm_info->high_mem_pgend != 0 ) + printf("WARNING: hvm_info->high_mem_pgend %x" + " does not point into high memory!", + hvm_info->high_mem_pgend); + high_mem_resource.base = 1ull << 32; + } + printf("%sRAM in high memory; setting high_mem resource base to "PRIllx"\n", + hvm_info->high_mem_pgend?"":"No ", + PRIllx_arg(high_mem_resource.base)); high_mem_resource.max = 1ull << cpu_phys_addr(); mem_resource.base = pci_mem_start; mem_resource.max = pci_mem_end; -- 1.7.9.5
George Dunlap
2013-Jun-21 16:08 UTC
[PATCH v5 4/8] hvmloader: Fix check for needing a 64-bit bar
After attempting to resize the MMIO hole, the check to determine whether there is a need to relocate BARs into 64-bit space checks the specific thing that caused the loop to exit (MMIO hole == 2GiB) rather than checking whether the required MMIO will fit in the hole. But even then it does it wrong: the polarity of the check is backwards. Check for the actual condition we care about (the sizeof the MMIO hole) rather than checking for the loop exit condition. v3: - Move earlier in the series, before other functional changes Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Jackson <ian.jackson@citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org> --- tools/firmware/hvmloader/pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c index a3d03ed..6792ed4 100644 --- a/tools/firmware/hvmloader/pci.c +++ b/tools/firmware/hvmloader/pci.c @@ -213,7 +213,7 @@ void pci_setup(void) ((pci_mem_start << 1) != 0) ) pci_mem_start <<= 1; - if ( (pci_mem_start << 1) != 0 ) + if ( mmio_total > (pci_mem_end - pci_mem_start) ) { printf("Low MMIO hole not large enough for all devices," " relocating some BARs to 64-bit\n"); -- 1.7.9.5
George Dunlap
2013-Jun-21 16:08 UTC
[PATCH v5 5/8] hvmloader: Correct bug in low mmio region accounting
When deciding whether to map a device in low MMIO space (<4GiB), hvmloader compares it with "mmio_left", which is set to the size of the low MMIO range (pci_mem_end - pci_mem_start). However, even if it does map a device in high MMIO space, it still removes the size of its BAR from mmio_left. In reality we don''t need to do a separate accounting of the low memory available -- this can be calculated from mem_resource. Just get rid of the variable and the duplicate accounting entirely. This will make the code more robust. Note also that the calculation of whether to move a device to 64-bit is fragile at the moment, depending on some unstated assumptions. State those assumptions in a comment for future reference. v5: - Add comment documenting fragility of the move-to-highmem check v3: - Use mem_resource values directly instead of doing duplicate accounting Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org> --- tools/firmware/hvmloader/pci.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c index 6792ed4..1fe250d 100644 --- a/tools/firmware/hvmloader/pci.c +++ b/tools/firmware/hvmloader/pci.c @@ -42,7 +42,6 @@ void pci_setup(void) uint32_t vga_devfn = 256; uint16_t class, vendor_id, device_id; unsigned int bar, pin, link, isa_irq; - int64_t mmio_left; /* Resources assignable to PCI devices via BARs. */ struct resource { @@ -264,8 +263,6 @@ void pci_setup(void) io_resource.base = 0xc000; io_resource.max = 0x10000; - mmio_left = pci_mem_end - pci_mem_start; - /* Assign iomem and ioport resources in descending order of size. */ for ( i = 0; i < nr_bars; i++ ) { @@ -273,7 +270,21 @@ void pci_setup(void) bar_reg = bars[i].bar_reg; bar_sz = bars[i].bar_sz; - using_64bar = bars[i].is_64bar && bar64_relocate && (mmio_left < bar_sz); + /* + * NB: The code here is rather fragile, as the check here to see + * whether bar_sz will fit in the low MMIO region doesn''t match the + * real check made below, which involves aligning the base offset of the + * bar with the size of the bar itself. As it happens, this will always + * be satisfied because: + * - The first one will succeed because the MMIO hole can only start at + * 0x{f,e,c,8}00000000. If it fits, it will be aligned properly. + * - All subsequent ones will be aligned because the list is ordered + * large to small, and bar_sz is always a power of 2. (At least + * the code here assumes it to be.) + * Should either of those two conditions change, this code will break. + */ + using_64bar = bars[i].is_64bar && bar64_relocate + && (bar_sz > (mem_resource.max - mem_resource.base)); bar_data = pci_readl(devfn, bar_reg); if ( (bar_data & PCI_BASE_ADDRESS_SPACE) =@@ -295,7 +306,6 @@ void pci_setup(void) resource = &mem_resource; bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK; } - mmio_left -= bar_sz; } else { -- 1.7.9.5
George Dunlap
2013-Jun-21 16:08 UTC
[PATCH v5 6/8] hvmloader: Load large devices into high MMIO space as needed
Keep track of how much mmio space is left total, as well as the amount of "low" MMIO space (<4GiB), and only load devices into high memory if there is not enough low memory for the rest of the devices to fit. Because devices are processed by size in order from large to small, this should preferentially relocate devices with large BARs to 64-bit space. v3: - Just use mmio_total rather than introducing a new variable. - Port to using mem_resource directly rather than low_mmio_left Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org> --- tools/firmware/hvmloader/pci.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c index 1fe250d..68d4e44 100644 --- a/tools/firmware/hvmloader/pci.c +++ b/tools/firmware/hvmloader/pci.c @@ -271,6 +271,11 @@ void pci_setup(void) bar_sz = bars[i].bar_sz; /* + * Relocate to high memory if the total amount of MMIO needed + * is more than the low MMIO available. Because devices are + * processed in order of bar_sz, this will preferentially + * relocate larger devices to high memory first. + * * NB: The code here is rather fragile, as the check here to see * whether bar_sz will fit in the low MMIO region doesn''t match the * real check made below, which involves aligning the base offset of the @@ -284,7 +289,7 @@ void pci_setup(void) * Should either of those two conditions change, this code will break. */ using_64bar = bars[i].is_64bar && bar64_relocate - && (bar_sz > (mem_resource.max - mem_resource.base)); + && (mmio_total > (mem_resource.max - mem_resource.base)); bar_data = pci_readl(devfn, bar_reg); if ( (bar_data & PCI_BASE_ADDRESS_SPACE) =@@ -306,6 +311,7 @@ void pci_setup(void) resource = &mem_resource; bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK; } + mmio_total -= bar_sz; } else { -- 1.7.9.5
George Dunlap
2013-Jun-21 16:08 UTC
[PATCH v5 7/8] hvmloader: Remove minimum size for BARs to relocate to 64-bit space
Allow devices with BARs less than 512MiB to be relocated to high memory. This will only be invoked if there is not enough low MMIO space to map the device, and will be done preferentially to large devices first; so in all likelihood only large devices will be remapped anyway. This is needed to work-around the issue of qemu-xen not being able to handle moving guest memory around to resize the MMIO hole. The default MMIO hole size is less than 256MiB. v3: - Fixed minor style issue Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org> --- tools/firmware/hvmloader/config.h | 1 - tools/firmware/hvmloader/pci.c | 5 ++--- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h index 8143d6f..6641197 100644 --- a/tools/firmware/hvmloader/config.h +++ b/tools/firmware/hvmloader/config.h @@ -55,7 +55,6 @@ extern struct bios_config ovmf_config; /* MMIO hole: Hardcoded defaults, which can be dynamically expanded. */ #define PCI_MEM_START 0xf0000000 #define PCI_MEM_END 0xfc000000 -#define PCI_MIN_BIG_BAR_SIZE 0x20000000 extern unsigned long pci_mem_start, pci_mem_end; diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c index 68d4e44..9253b0b 100644 --- a/tools/firmware/hvmloader/pci.c +++ b/tools/firmware/hvmloader/pci.c @@ -295,9 +295,8 @@ void pci_setup(void) if ( (bar_data & PCI_BASE_ADDRESS_SPACE) = PCI_BASE_ADDRESS_SPACE_MEMORY ) { - /* Mapping high memory if PCI deivce is 64 bits bar and the bar size - is larger than 512M */ - if (using_64bar && (bar_sz > PCI_MIN_BIG_BAR_SIZE)) { + /* Mapping high memory if PCI device is 64 bits bar */ + if ( using_64bar ) { if ( high_mem_resource.base & (bar_sz - 1) ) high_mem_resource.base = high_mem_resource.base - (high_mem_resource.base & (bar_sz - 1)) + bar_sz; -- 1.7.9.5
George Dunlap
2013-Jun-21 16:08 UTC
[PATCH v5 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole
At the moment, qemu-xen can''t handle memory being relocated by hvmloader. This may happen if a device with a large enough memory region is passed through to the guest. At the moment, if this happens, then at some point in the future qemu will crash and the domain will hang. (qemu-traditional is fine.) It''s too late in the release to do a proper fix, so we try to do damage control. hvmloader already has mechanisms to relocate memory to 64-bit space if it can''t make a big enough MMIO hole. By default this is 2GiB; if we just refuse to make the hole bigger if it will overlap with guest memory, then the relocation will happen by default. v5: - Update comment to not refer to "this series". v4: - Wrap long line in libxl_dm.c - Fix comment v3: - Fix polarity of comparison - Move diagnostic messages to another patch - Tested with xen platform pci device hacked to have different BAR sizes {256MiB, 1GiB} x {qemu-xen, qemu-traditional} x various memory configurations - Add comment explaining why we default to "allow" - Remove cast to bool v2: - style fixes - fix and expand comment on the MMIO hole loop - use "%d" rather than "%s" -> (...)?"1":"0" - use bool instead of uint8_t - Move 64-bit bar relocate detection to another patch - Add more diagnostic messages Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org> CC: Keir Fraser <keir@xen.org> --- tools/firmware/hvmloader/pci.c | 49 +++++++++++++++++++++++++++++-- tools/libxl/libxl_dm.c | 8 +++++ xen/include/public/hvm/hvm_xs_strings.h | 1 + 3 files changed, 56 insertions(+), 2 deletions(-) diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c index 9253b0b..627e8cb 100644 --- a/tools/firmware/hvmloader/pci.c +++ b/tools/firmware/hvmloader/pci.c @@ -27,6 +27,8 @@ #include <xen/memory.h> #include <xen/hvm/ioreq.h> +#include <xen/hvm/hvm_xs_strings.h> +#include <stdbool.h> unsigned long pci_mem_start = PCI_MEM_START; unsigned long pci_mem_end = PCI_MEM_END; @@ -57,6 +59,32 @@ void pci_setup(void) } *bars = (struct bars *)scratch_start; unsigned int i, nr_bars = 0; + const char *s; + /* + * Do we allow hvmloader to relocate guest memory in order to + * increase the size of the lowmem MMIO hole? Defaulting to 1 + * here will mean that non-libxl toolstacks (including xend and + * home-grown ones) means that those using qemu-xen will still + * experience the memory relocation bug described below; but it + * also means that those using qemu-traditional will *not* + * experience any change; and it also means that there is a + * work-around for those using qemu-xen, namely switching to + * qemu-traditional. + * + * If we defaulted to 0, and failing to resize the hole caused any + * problems with qemu-traditional, then there is no work-around. + * + * Since xend can only use qemu-traditional, I think this is the + * option that will have the least impact. + */ + bool allow_memory_relocate = 1; + + s = xenstore_read(HVM_XS_ALLOW_MEMORY_RELOCATE, NULL); + if ( s ) + allow_memory_relocate = strtoll(s, NULL, 0); + printf("Relocating guest memory for lowmem MMIO space %s\n", + allow_memory_relocate?"enabled":"disabled"); + /* Program PCI-ISA bridge with appropriate link routes. */ isa_irq = 0; for ( link = 0; link < 4; link++ ) @@ -208,8 +236,25 @@ void pci_setup(void) pci_writew(devfn, PCI_COMMAND, cmd); } - while ( (mmio_total > (pci_mem_end - pci_mem_start)) && - ((pci_mem_start << 1) != 0) ) + /* + * At the moment qemu-xen can''t deal with relocated memory regions. + * It''s too close to the release to make a proper fix; for now, + * only allow the MMIO hole to grow large enough to move guest memory + * if we''re running qemu-traditional. Items that don''t fit will be + * relocated into the 64-bit address space. + * + * This loop now does the following: + * - If allow_memory_relocate, increase the MMIO hole until it''s + * big enough, or until it''s 2GiB + * - If !allow_memory_relocate, increase the MMIO hole until it''s + * big enough, or until it''s 2GiB, or until it overlaps guest + * memory + */ + while ( (mmio_total > (pci_mem_end - pci_mem_start)) + && ((pci_mem_start << 1) != 0) + && (allow_memory_relocate + || (((pci_mem_start << 1) >> PAGE_SHIFT) + >= hvm_info->low_mem_pgend)) ) pci_mem_start <<= 1; if ( mmio_total > (pci_mem_end - pci_mem_start) ) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index ac1f90e..7e54c02 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -1154,6 +1154,14 @@ void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss) libxl__xs_write(gc, XBT_NULL, libxl__sprintf(gc, "%s/hvmloader/bios", path), "%s", libxl_bios_type_to_string(b_info->u.hvm.bios)); + /* Disable relocating memory to make the MMIO hole larger + * unless we''re running qemu-traditional */ + libxl__xs_write(gc, XBT_NULL, + libxl__sprintf(gc, + "%s/hvmloader/allow-memory-relocate", + path), + "%d", + b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL); free(path); } diff --git a/xen/include/public/hvm/hvm_xs_strings.h b/xen/include/public/hvm/hvm_xs_strings.h index 9042303..4de5881 100644 --- a/xen/include/public/hvm/hvm_xs_strings.h +++ b/xen/include/public/hvm/hvm_xs_strings.h @@ -28,6 +28,7 @@ #define HVM_XS_HVMLOADER "hvmloader" #define HVM_XS_BIOS "hvmloader/bios" #define HVM_XS_GENERATION_ID_ADDRESS "hvmloader/generation-id-address" +#define HVM_XS_ALLOW_MEMORY_RELOCATE "hvmloader/allow-memory-relocate" /* The following values allow additional ACPI tables to be added to the * virtual ACPI BIOS that hvmloader constructs. The values specify the guest -- 1.7.9.5
Ian Jackson
2013-Jun-24 11:27 UTC
Re: [PATCH v5 0/8] Relocate devices rather than memory for qemu-xen
George Dunlap writes ("[PATCH v5 0/8] Relocate devices rather than memory for qemu-xen"):> This is the fifth version of a patch series to address the issue of > qemu-xen not being able to handle moving guest memory in order to > resize the lowmem MMIO hole. > > The only changes from v4 have to do with adding and correcting > comments.I have applied this series, thanks. Ian.
Ian Campbell
2013-Jun-25 13:23 UTC
Re: [PATCH v5 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole
On Fri, 2013-06-21 at 17:08 +0100, George Dunlap wrote:> > diff --git a/xen/include/public/hvm/hvm_xs_strings.h b/xen/include/public/hvm/hvm_xs_strings.h > index 9042303..4de5881 100644 > --- a/xen/include/public/hvm/hvm_xs_strings.h > +++ b/xen/include/public/hvm/hvm_xs_strings.h > @@ -28,6 +28,7 @@ > #define HVM_XS_HVMLOADER "hvmloader" > #define HVM_XS_BIOS "hvmloader/bios" > #define HVM_XS_GENERATION_ID_ADDRESS "hvmloader/generation-id-address" > +#define HVM_XS_ALLOW_MEMORY_RELOCATE "hvmloader/allow-memory-relocate"Please could you also followup with a patch to docs/misc/xenstore-paths.markdown describing this key. Ian.