thr3ads.net - Xen devel - [PATCH 0/8] Relocate devices rather than memory for qemu-xen [Jun 2013]

If this information is useful, please help other people find it:
Share via:

George Dunlap

2013-Jun-20 16:33 UTC

[PATCH 0/8] Relocate devices rather than memory for qemu-xen

This is the third version of a patch series to address the issue of
qemu-xen not being able to handle moving guest memory in order to
resize the lowmem MMIO hole.

This series adds and reloctes several patches.  A brief summary can be
seen below:

 * 1/8 hvmloader: Remove all 64-bit print arguments
 * 2/8 hvmloader: Make the printfs more informative
 * 3/8 hvmloader: Set up highmem resouce appropriately if there is no RAM above
4G
AM 4/8 hvmloader: Fix check for needing a 64-bit bar
   5/8 hvmloader: Correct bug in low mmio region accounting
   6/8 hvmloader: Load large devices into high MMIO space as needed
A  7/8 hvmloader: Remove minimum size for BARs to relocate to 64-bit space
   8/8 libxl,hvmloader: Don''t relocate memory for MMIO hole

Key 
 *: New in v3
 M: Moved to a different place in the series
 A: Reviewed / Acked by more than 2 people

= The situation = 

The default MMIO hole for Xen systems starts at 0xf0000000; this
leaves just under 256MiB of space for PCI devices.  At the moment,
hvmloader will scan the pci bus for devices and resize this hole, up
to 2GiB (i.e., starting at 0x80000000) to make space, relocating any
overlapping guest memory above 4GiB (0x100000000).  (After that point,
if there is still not enough space, the intention seemed to be that it
would begin mapping devices with 64-bit-capabile BARs into high memory
as well, just above the end of RAM; however, there seems to be a bug
in the code which detects this condition; it is likely that the 64-bit
remapping code was never capable of being triggered.)

We expect the default MMIO hole to be insufficient only when passing
through devices to guests.

This works fine for qemu-traditional, but qemu-xen unfortunately has
expectations of where guest memory will be, and will get confused if
it moves.  If hvmloader does relocate guest RAM, then at some point
qemu will try to map that pfn space, resulting in a seg fault and qemu
crashing.

hvmloader of course will only move RAM if it would overlap the MMIO
region; this means that if the guest has a small enough amount of RAM
-- say, 2GiB -- then this "memory move" condition will also never be
triggered. 

So at the moment, then under the following conditions:
 * A user is passing through a device or set of devices requiring more
 MMIO space than the default MMIO hole
 * The user has enough memory that resizing the hole will overlap
 guest memory
 * The user is using qemu-xen (not qemu-traditoinal)
then the user will shortly after boot experience qemu crashing.

= The proposed fix = 

This patch series makes the following functional changes:

The core change is this:

 * When running qemu-xen, don''t resize the MMIO hole; instead rely on
devices being moved into the 64-bit MMIO region.

In order to make this more effective, we also make the following
changes to the 64-bit relocation code:

 * Allow devices smaller than 512MiB to be relocated to the high MMIO
region.

 * When moving devices into the 64-bit MMIO region, start with the
ones with the largest BARs, and only relocate them if there is not
enough space for all the remaining BARs

= Risk analysis 
There are two kinds of risks: risks due to unintended changes (i.e., a
bug in the patch itself), and risks due to intended changes.

We hope that we can solve the first by a combination of testing and
code review.

The rest of this analysis will assume that the patch is correct, and
will try to do a risk analysis on the effects of the patch.

The main risk is that moving some devices into 64-bit memory will
cause problems with the operation of those devices.  Relocating a
device may have the following outcomes:
 1. In the best case, the relocated device will Just Work.  
 2. A relocated device may fail in a way that leaves the OS intact: the
guest OS may not be able to see them, or the driver may not load.
 3. A relocated device may fail in a way that crashes the guest OS: the
driver may crash, or one of the relocated devices which fails may be
system-critical.
 4. A relocated device may fail in a way which is unpredictable, but
does not cause data loss: crashing the guest randomly at some point in
the future, or causing strange quirks in functionality (e.g.,
network connectivity dropping, glitches when watching video).
 5. A relocated device may fail in a way that is unpredictable, and
corrupts data.

Outcomes 1-3 are equivalent or strictly better than crashing within a
few minutes of boot.  Outcome 4 is arguably also not much worse.

The main risk to our users would be #5.  However:
 - This is definitely a bug in the driver, OS, or the hardware
 - This is a bug that might be seen running on real hardware, or in
KVM (or some other hypervisor)
 - This is not a bug that we would be likely to catch, even if we had
a full development cycle worth of testing.

I think we should therefore not worry about #5, and consider in
general that relocating a device into 64-bit space will be no worse,
and potentially better, than crashing within a few minutes of boot.

There is another risk with this method, which is that a user may end
up passing through a number of devices with NON-64-bit BARs such that
the devices cannot all fit in the default lowmem MMIO region, but also
cannot be remapped above the 64-bit region.  If this is the case, then
some devices will simply not be able to be mapped.  If these
non-mapped devices are system critical, the VM will not boot; if they
are not, then the devices will simply be invisible.  Both of these are
either no worse than, and potentially better than, crashing within a
few minutes of boot.

Starting with all VMs:

Any VM running in PV mode will be unaffected.

Any VM running in HVM mode but not passing through devices will be
unaffected.

Any VM running in HVM mode and passing through devices that fit inside
the default MMIO space will be unaffected.

Any VM running in HVM mode, and passing through devices that require
less than 2GiB of MMIO space, *and* having a low enough guest memory
that the MMIO hole can be enlarged without moving guest memory, will
be unaffected.  (For example, if you need 512MiB and you have <3584
MiB of guest RAM; or if you need 1024MiB and have <3072 MiB of guest
RAM.)

Any VM running in HVM mode, is passing through devices requiring less
than 2GiB of MMIO space, and is using qemu-traditional will be
unaffected.

For a VM running in HVM mode, passing through devices which require more
than 2GiB of MMIO space, and using qemu-traditional, and having more
than 2GiB of guest memory:  
 * We believe that at the moment what will happen is that because of a
bug in hvmloader (fixed in this series), no devices will be mapped in
64-bit space; instead, the smallest devices will simply not be mapped.
This will likely cause critical platform devices not to be mapped,
causing the VM not to be able to boot.
 * With this patch, the largest devices *will* be remapped into 64-bit
space.  
 * If we are right that the current code will fail, this is a uniform
improvement, even if the devices don''t work.  
 * If the current code would work, then a different set of devices
will be re-mapped to high memory.  This may change some configurations
from "works" into "doesn''t work".

I think this is a small enough contingent of users, that this is an
acceptable amount of risk to take.

We have now covered all configurations of qemu-traditional.  Since
xend only knows how to use qemu-traditional, this also covers all
configurations using xend.

For VMs running in HVM mode, using qemu-xen, but not using libxl, this
patch will have no effect: qemu-xen will crash.  NB that this cannot
include xend, as it only knows how to drive qemu-traditional.  This
can be worked around by using qemu-traditional instead, or by setting
the appropriate xenstore key on boot.  This is acceptable, because
this is not really a supported configuration; users should use one of
the supported toolstacks, or use libxl.

We have now covered all users of any non-libxl-based toolstack.

For VM running in HVM mode, using qemu-xen, using libxl, and passing
through devices such that the required 32-bit only MMIO space does not
fit in the default MMIO hole, and with enough memory that resizing the
MMIO hole requires moving guest RAM:
 * At the moment, hvmloader will relocate guest memory.  This will
cause qemu-xen to crash within a few minutes.
 * With this change, the devices with the smallest BARs will simply
not be mapped.  If these devices are non-critical, they will simply be
invisible to the OS; if these devices are critical, the OS will not
boot.

Crashing immediately or having non-visible devices are the same or
better than crashing a few minutes into boot, so this is an
improvement (or at least not a regression).

For a VM running in HVM mode, using qemu-xen, using libxl, having a
required 32-bit only MMIO space that does fit within the default MMIO
hole, but a total MMIO space that does not, and having enough memory
that resizing the MMIO hole requires moving guest RAM:
 * At the moment, hvmloader will relocate memory.  This will cause
qemu-xen to crash within a few minutes of booting.  Note that this is
true whether the total MMIO space is less than 2GiB or more.
 * With this change, devices with the largest BARs will be relocated
to 64-bit space.  We expect that in general, the devices thus
relocated will be the passed-through PCI devices.

We have decided already to consider any outcome of mapping a device
into a 64-bit address space to be no worse than, and potentially
better than, qemu-xen crashing; so this can be considered an improvement.

We have now covered all possible configurations.

In summary:
 * The vast majority of configurations are unaffected
 * For those that are affected, the vast majority are either a strict
improvement, or no worse than, the status quo.
 * There is a slight possibility that in one extreme corner case
(using qemu-traditional with >2GiB of MMIO space), we may possibly be
changing "works" into "fails".  I think this is an
acceptable risk.

Therefore, I think the risks posed by this change are acceptable.

George Dunlap

2013-Jun-20 16:33 UTC

head link

[PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments

The printf() available to hvmloader does not handle 64-bit data types;
manually break them down as two 32-bit strings.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/pci.c |   11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index c78d4d3..aa54bc1 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -290,8 +290,9 @@ void pci_setup(void)
 
         if ( (base < resource->base) || (base > resource->max) )
         {
-            printf("pci dev %02x:%x bar %02x size %llx: no space for
"
-                   "resource!\n", devfn>>3, devfn&7,
bar_reg, bar_sz);
+            printf("pci dev %02x:%x bar %02x size %x%08x: no space for
"
+                   "resource!\n", devfn>>3, devfn&7,
bar_reg,
+                   (uint32_t)(bar_sz>>32), (uint32_t)bar_sz);
             continue;
         }
 
@@ -300,8 +301,10 @@ void pci_setup(void)
         pci_writel(devfn, bar_reg, bar_data);
         if (using_64bar)
             pci_writel(devfn, bar_reg + 4, bar_data_upper);
-        printf("pci dev %02x:%x bar %02x size %llx: %08x\n",
-               devfn>>3, devfn&7, bar_reg, bar_sz, bar_data);
+        printf("pci dev %02x:%x bar %02x size %x%08x: %08x\n",
+               devfn>>3, devfn&7, bar_reg,
+               (uint32_t)(bar_sz>>32), (uint32_t)bar_sz,
+               bar_data);
 			
 
         /* Now enable the memory or I/O mapping. */
-- 
1.7.9.5

George Dunlap

2013-Jun-20 16:33 UTC

head link

[PATCH v3 2/8] hvmloader: Make the printfs more informative

* Warn that you''re relocating some BARs to 64-bit

* Warn that you''re relocating guest pages, and how many

* Include upper 32-bits of the base register when printing the bar
  placement info

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/pci.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index aa54bc1..d8592b0 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -213,10 +213,17 @@ void pci_setup(void)
             ((pci_mem_start << 1) != 0) )
         pci_mem_start <<= 1;
 
-    if ( (pci_mem_start << 1) != 0 )
+    if ( (pci_mem_start << 1) != 0 ) {
+        printf("Low MMIO hole not large enough for all devices,"
+               " relocating some BARs to 64-bit\n");
         bar64_relocate = 1;
+    }
 
     /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
+    if ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
+        printf("Relocating 0x%lx pages to highmem for lowmem MMIO
hole\n",
+               hvm_info->low_mem_pgend - (pci_mem_start >>
PAGE_SHIFT));
+
     while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend
)
     {
         struct xen_add_to_physmap xatp;
@@ -301,10 +308,10 @@ void pci_setup(void)
         pci_writel(devfn, bar_reg, bar_data);
         if (using_64bar)
             pci_writel(devfn, bar_reg + 4, bar_data_upper);
-        printf("pci dev %02x:%x bar %02x size %x%08x: %08x\n",
+        printf("pci dev %02x:%x bar %02x size %x%08x: %x%08x\n",
                devfn>>3, devfn&7, bar_reg,
                (uint32_t)(bar_sz>>32), (uint32_t)bar_sz,
-               bar_data);
+               bar_data_upper, bar_data);
 			
 
         /* Now enable the memory or I/O mapping. */
-- 
1.7.9.5

George Dunlap

2013-Jun-20 16:33 UTC

head link

[PATCH v3 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

hvmloader will read hvm_info->high_mem_pgend to calculate where to start the
highmem PCI  region.  However, if the guest does not have any memory in the high
region, this is set to zero, which will cause hvmloader to use the "0"
for the base
of the highmem region, rather than 1 << 32.

Check to see whether hvm_info->high_mem_pgend is set; if so, do the normal
calculation;
otherwise, use 1<<32.

Signed-off-by: Geore Dunlap <george.dunlap@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/pci.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index d8592b0..f0f11e2 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -244,7 +244,14 @@ void pci_setup(void)
         hvm_info->high_mem_pgend += nr_pages;
     }
 
-    high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend) <<
PAGE_SHIFT;
+    if ( hvm_info->high_mem_pgend )
+        high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend)
<< PAGE_SHIFT;
+    else
+        high_mem_resource.base = 1ull << 32;
+    printf("%sRAM in high memory; setting high_mem resource base to
%x%08x\n",
+           hvm_info->high_mem_pgend?"":"No ",
+           (uint32_t)(high_mem_resource.base>>32),
+           (uint32_t)high_mem_resource.base);
     high_mem_resource.max = 1ull << cpu_phys_addr();
     mem_resource.base = pci_mem_start;
     mem_resource.max = pci_mem_end;
-- 
1.7.9.5

George Dunlap

2013-Jun-20 16:33 UTC

head link

[PATCH v3 4/8] hvmloader: Fix check for needing a 64-bit bar

After attempting to resize the MMIO hole, the check to determine
whether there is a need to relocate BARs into 64-bit space checks the
specific thing that caused the loop to exit (MMIO hole == 2GiB) rather
than checking whether the required MMIO will fit in the hole.

But even then it does it wrong: the polarity of the check is backwards.

Check for the actual condition we care about (the sizeof the MMIO
hole) rather than checking for the loop exit condition.

v3:
 - Move earlier in the series, before other functional changes

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/pci.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index f0f11e2..4e8dc6a 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -213,7 +213,7 @@ void pci_setup(void)
             ((pci_mem_start << 1) != 0) )
         pci_mem_start <<= 1;
 
-    if ( (pci_mem_start << 1) != 0 ) {
+    if ( mmio_total > (pci_mem_end - pci_mem_start) ) {
         printf("Low MMIO hole not large enough for all devices,"
                " relocating some BARs to 64-bit\n");
         bar64_relocate = 1;
-- 
1.7.9.5

George Dunlap

2013-Jun-20 16:33 UTC

head link

[PATCH v3 5/8] hvmloader: Correct bug in low mmio region accounting

When deciding whether to map a device in low MMIO space (<4GiB),
hvmloader compares it with "mmio_left", which is set to the size of
the low MMIO range (pci_mem_end - pci_mem_start).  However, even if it
does map a device in high MMIO space, it still removes the size of its
BAR from mmio_left.

In reality we don''t need to do a separate accounting of the low memory
available -- this can be calculated from mem_resource.  Just get rid of
the variable and the duplicate accounting entirely.  This will make the code
more robust.

v3:
 - Use mem_resource values directly instead of doing duplicate accounting

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/pci.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 4e8dc6a..606ccca 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -42,7 +42,6 @@ void pci_setup(void)
     uint32_t vga_devfn = 256;
     uint16_t class, vendor_id, device_id;
     unsigned int bar, pin, link, isa_irq;
-    int64_t mmio_left;
 
     /* Resources assignable to PCI devices via BARs. */
     struct resource {
@@ -258,8 +257,6 @@ void pci_setup(void)
     io_resource.base = 0xc000;
     io_resource.max = 0x10000;
 
-    mmio_left = pci_mem_end - pci_mem_start;
-
     /* Assign iomem and ioport resources in descending order of size. */
     for ( i = 0; i < nr_bars; i++ )
     {
@@ -267,7 +264,8 @@ void pci_setup(void)
         bar_reg = bars[i].bar_reg;
         bar_sz  = bars[i].bar_sz;
 
-        using_64bar = bars[i].is_64bar && bar64_relocate &&
(mmio_left < bar_sz);
+        using_64bar = bars[i].is_64bar && bar64_relocate
+            && (bar_sz > (mem_resource.max - mem_resource.base));
         bar_data = pci_readl(devfn, bar_reg);
 
         if ( (bar_data & PCI_BASE_ADDRESS_SPACE) =@@ -289,7 +287,6 @@ void
pci_setup(void)
                 resource = &mem_resource;
                 bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
             }
-            mmio_left -= bar_sz;
         }
         else
         {
-- 
1.7.9.5

George Dunlap

2013-Jun-20 16:33 UTC

head link

[PATCH v3 6/8] hvmloader: Load large devices into high MMIO space as needed

Keep track of how much mmio space is left total, as well as the amount
of "low" MMIO space (<4GiB), and only load devices into high memory
if there is not enough low memory for the rest of the devices to fit.

Because devices are processed by size in order from large to small,
this should preferentially relocate devices with large BARs to 64-bit
space.

v3:
 - Just use mmio_total rather than introducing a new variable.
 - Port to using mem_resource directly rather than low_mmio_left

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/pci.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 606ccca..ac55d3e 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -264,8 +264,12 @@ void pci_setup(void)
         bar_reg = bars[i].bar_reg;
         bar_sz  = bars[i].bar_sz;
 
+        /* Relocate to high memory if the total amount of MMIO needed
+         * is more than the low MMIO available.  Because devices are
+         * processed in order of bar_sz, this will preferentially
+         * relocate larger devices to high memory first. */
         using_64bar = bars[i].is_64bar && bar64_relocate
-            && (bar_sz > (mem_resource.max - mem_resource.base));
+            && (mmio_total > (mem_resource.max -
mem_resource.base));
         bar_data = pci_readl(devfn, bar_reg);
 
         if ( (bar_data & PCI_BASE_ADDRESS_SPACE) =@@ -287,6 +291,7 @@ void
pci_setup(void)
                 resource = &mem_resource;
                 bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
             }
+            mmio_total -= bar_sz;
         }
         else
         {
-- 
1.7.9.5

George Dunlap

2013-Jun-20 16:33 UTC

head link

[PATCH v3 7/8] hvmloader: Remove minimum size for BARs to relocate to 64-bit space

Allow devices with BARs less than 512MiB to be relocated to high memory.

This will only be invoked if there is not enough low MMIO space to map
the device, and will be done preferentially to large devices first; so
in all likelihood only large devices will be remapped anyway.

This is needed to work-around the issue of qemu-xen not being able to
handle moving guest memory around to resize the MMIO hole.  The default
MMIO hole size is less than 256MiB.

v3:
 - Fixed minor style issue

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/config.h |    1 -
 tools/firmware/hvmloader/pci.c    |    5 ++---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/tools/firmware/hvmloader/config.h
b/tools/firmware/hvmloader/config.h
index 8143d6f..6641197 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -55,7 +55,6 @@ extern struct bios_config ovmf_config;
 /* MMIO hole: Hardcoded defaults, which can be dynamically expanded. */
 #define PCI_MEM_START       0xf0000000
 #define PCI_MEM_END         0xfc000000
-#define PCI_MIN_BIG_BAR_SIZE          0x20000000
 
 extern unsigned long pci_mem_start, pci_mem_end;
 
diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index ac55d3e..3108c8a 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -275,9 +275,8 @@ void pci_setup(void)
         if ( (bar_data & PCI_BASE_ADDRESS_SPACE) =             
PCI_BASE_ADDRESS_SPACE_MEMORY )
         {
-            /* Mapping high memory if PCI deivce is 64 bits bar and the bar
size
-               is larger than 512M */
-            if (using_64bar && (bar_sz > PCI_MIN_BIG_BAR_SIZE)) {
+            /* Mapping high memory if PCI device is 64 bits bar */
+            if ( using_64bar ) {
                 if ( high_mem_resource.base & (bar_sz - 1) )
                     high_mem_resource.base = high_mem_resource.base - 
                         (high_mem_resource.base & (bar_sz - 1)) + bar_sz;
-- 
1.7.9.5

George Dunlap

2013-Jun-20 16:33 UTC

head link

[PATCH v3 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

At the moment, qemu-xen can''t handle memory being relocated by
hvmloader.  This may happen if a device with a large enough memory
region is passed through to the guest.  At the moment, if this
happens, then at some point in the future qemu will crash and the
domain will hang.  (qemu-traditional is fine.)

It''s too late in the release to do a proper fix, so we try to do
damage control.

hvmloader already has mechanisms to relocate memory to 64-bit space
if it can''t make a big enough MMIO hole.  By default this is 2GiB; if
we just refuse to make the hole bigger if it will overlap with guest
memory, then the relocation will happen by default.

v3:
 - Fix polarity of comparison
 - Move diagnostic messages to another patch
 - Tested with xen platform pci device hacked to have different BAR sizes
   {256MiB, 1GiB} x {qemu-xen, qemu-traditional} x various memory
   configurations
 - Add comment explaining why we default to "allow"
 - Remove cast to bool
v2:
 - style fixes
 - fix and expand comment on the MMIO hole loop
 - use "%d" rather than "%s" ->
(...)?"1":"0"
 - use bool instead of uint8_t
 - Move 64-bit bar relocate detection to another patch
 - Add more diagnostic messages

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/pci.c          |   49 +++++++++++++++++++++++++++++--
 tools/libxl/libxl_dm.c                  |    6 ++++
 xen/include/public/hvm/hvm_xs_strings.h |    1 +
 3 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 3108c8a..2364177 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -27,6 +27,8 @@
 
 #include <xen/memory.h>
 #include <xen/hvm/ioreq.h>
+#include <xen/hvm/hvm_xs_strings.h>
+#include <stdbool.h>
 
 unsigned long pci_mem_start = PCI_MEM_START;
 unsigned long pci_mem_end = PCI_MEM_END;
@@ -57,6 +59,32 @@ void pci_setup(void)
     } *bars = (struct bars *)scratch_start;
     unsigned int i, nr_bars = 0;
 
+    const char *s;
+    /*
+     * Do we allow hvmloader to relocate guest memory in order to
+     * increase the size of the lowmem MMIO hole?  Defaulting to 1
+     * here will mean that non-libxl toolstacks (including xend and
+     * home-grown ones) will experience this series as "no change".
+     * It does mean that those using qemu-xen will still experience
+     * the bug (described below); but it also means that those using
+     * qemu-traditional will *not* experience any change; and it also
+     * means that there is a work-around for those using qemu-xen,
+     * namely switching to qemu-traditional.
+     *
+     * If we defaulted to 0, and failing to resize the hole caused any
+     * problems with qemu-traditional, then there is no work-around.
+     *
+     * Since xend can''t talk to qemu-traditional, I think this is the
+     * option that will have the least impact.
+     */
+    bool allow_memory_relocate = 1;
+
+    s = xenstore_read(HVM_XS_ALLOW_MEMORY_RELOCATE, NULL);
+    if ( s )
+        allow_memory_relocate = strtoll(s, NULL, 0);
+    printf("Relocating guest memory for lowmem MMIO space %s\n",
+           allow_memory_relocate?"enabled":"disabled");
+
     /* Program PCI-ISA bridge with appropriate link routes. */
     isa_irq = 0;
     for ( link = 0; link < 4; link++ )
@@ -208,8 +236,25 @@ void pci_setup(void)
         pci_writew(devfn, PCI_COMMAND, cmd);
     }
 
-    while ( (mmio_total > (pci_mem_end - pci_mem_start)) &&
-            ((pci_mem_start << 1) != 0) )
+    /*
+     * At the moment qemu-xen can''t deal with relocated memory
regions.
+     * It''s too close to the release to make a proper fix; for now,
+     * only allow the MMIO hole to grow large enough to move guest memory
+     * if we''re running qemu-traditional.  Items that don''t
fit will be
+     * relocated into the 64-bit address space.
+     *
+     * This loop now does the following:
+     * - If allow_memory_relocate, increase the MMIO hole until it''s
+     *   big enough, or until it''s 2GiB
+     * - If !allow_memory_relocate, increase the MMIO hole until it''s
+     *   big enough, or until it''s 2GiB, or until it overlaps guest
+     *   memory
+     */
+    while ( (mmio_total > (pci_mem_end - pci_mem_start)) 
+            && ((pci_mem_start << 1) != 0)
+            && (allow_memory_relocate
+                || (((pci_mem_start << 1) >> PAGE_SHIFT)
+                    >= hvm_info->low_mem_pgend)) )
         pci_mem_start <<= 1;
 
     if ( mmio_total > (pci_mem_end - pci_mem_start) ) {
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index ac1f90e..342d2ce 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1154,6 +1154,12 @@ void libxl__spawn_local_dm(libxl__egc *egc,
libxl__dm_spawn_state *dmss)
         libxl__xs_write(gc, XBT_NULL,
                         libxl__sprintf(gc, "%s/hvmloader/bios",
path),
                         "%s",
libxl_bios_type_to_string(b_info->u.hvm.bios));
+        /* Disable relocating memory to make the MMIO hole larger
+         * unless we''re running qemu-traditional */
+        libxl__xs_write(gc, XBT_NULL,
+                        libxl__sprintf(gc,
"%s/hvmloader/allow-memory-relocate", path),
+                        "%d",
+                       
b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL);
         free(path);
     }
 
diff --git a/xen/include/public/hvm/hvm_xs_strings.h
b/xen/include/public/hvm/hvm_xs_strings.h
index 9042303..4de5881 100644
--- a/xen/include/public/hvm/hvm_xs_strings.h
+++ b/xen/include/public/hvm/hvm_xs_strings.h
@@ -28,6 +28,7 @@
 #define HVM_XS_HVMLOADER               "hvmloader"
 #define HVM_XS_BIOS                    "hvmloader/bios"
 #define HVM_XS_GENERATION_ID_ADDRESS  
"hvmloader/generation-id-address"
+#define HVM_XS_ALLOW_MEMORY_RELOCATE  
"hvmloader/allow-memory-relocate"
 
 /* The following values allow additional ACPI tables to be added to the
  * virtual ACPI BIOS that hvmloader constructs. The values specify the guest
-- 
1.7.9.5

Stefano Stabellini

2013-Jun-20 17:06 UTC

head link

Re: [PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments

On Thu, 20 Jun 2013, George Dunlap wrote:> The printf() available to hvmloader does not handle 64-bit data types;
> manually break them down as two 32-bit strings.
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Hanweidong <hanweidong@huawei.com>
> CC: Keir Fraser <keir@xen.org>
What''s the issue with implementing %llx?

>  tools/firmware/hvmloader/pci.c |   11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c
b/tools/firmware/hvmloader/pci.c
> index c78d4d3..aa54bc1 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -290,8 +290,9 @@ void pci_setup(void)
>  
>          if ( (base < resource->base) || (base > resource->max)
)
>          {
> -            printf("pci dev %02x:%x bar %02x size %llx: no space for
"
> -                   "resource!\n", devfn>>3, devfn&7,
bar_reg, bar_sz);
> +            printf("pci dev %02x:%x bar %02x size %x%08x: no space
for "
> +                   "resource!\n", devfn>>3, devfn&7,
bar_reg,
> +                   (uint32_t)(bar_sz>>32), (uint32_t)bar_sz);
>              continue;
>          }
>  
> @@ -300,8 +301,10 @@ void pci_setup(void)
>          pci_writel(devfn, bar_reg, bar_data);
>          if (using_64bar)
>              pci_writel(devfn, bar_reg + 4, bar_data_upper);
> -        printf("pci dev %02x:%x bar %02x size %llx: %08x\n",
> -               devfn>>3, devfn&7, bar_reg, bar_sz, bar_data);
> +        printf("pci dev %02x:%x bar %02x size %x%08x: %08x\n",
> +               devfn>>3, devfn&7, bar_reg,
> +               (uint32_t)(bar_sz>>32), (uint32_t)bar_sz,
> +               bar_data);
>  			
>  
>          /* Now enable the memory or I/O mapping. */
> -- 
> 1.7.9.5
>

Ian Jackson

2013-Jun-20 17:09 UTC

head link

Re: [PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments

Stefano Stabellini writes ("Re: [PATCH v3 1/8] hvmloader: Remove all 64-bit
print arguments"):> What''s the issue with implementing %llx?
Isn''t this an endless series of just-before-release yaks ?

If it were me I would have left the %llx alone.

Ian.

Stefano Stabellini

2013-Jun-20 17:12 UTC

head link

Re: [PATCH v3 2/8] hvmloader: Make the printfs more informative

On Thu, 20 Jun 2013, George Dunlap wrote:> * Warn that you''re relocating some BARs to 64-bit
> 
> * Warn that you''re relocating guest pages, and how many
> 
> * Include upper 32-bits of the base register when printing the bar
>   placement info
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Hanweidong <hanweidong@huawei.com>
> CC: Keir Fraser <keir@xen.org>
> ---
>  tools/firmware/hvmloader/pci.c |   13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c
b/tools/firmware/hvmloader/pci.c
> index aa54bc1..d8592b0 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -213,10 +213,17 @@ void pci_setup(void)
>              ((pci_mem_start << 1) != 0) )
>          pci_mem_start <<= 1;
>  
> -    if ( (pci_mem_start << 1) != 0 )
> +    if ( (pci_mem_start << 1) != 0 ) {
> +        printf("Low MMIO hole not large enough for all devices,"
> +               " relocating some BARs to 64-bit\n");
>          bar64_relocate = 1;
> +    }
>  
>      /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
> +    if ( (pci_mem_start >> PAGE_SHIFT) <
hvm_info->low_mem_pgend )
> +        printf("Relocating 0x%lx pages to highmem for lowmem MMIO
hole\n",
> +               hvm_info->low_mem_pgend - (pci_mem_start >>
PAGE_SHIFT));
Shouldn''t this be:

min_t(unsigned int,
      hvm_info->low_mem_pgend - (pci_mem_start >> PAGE_SHIFT),
      (1u << 16) - 1);

to match exactly what we do in the relocation code?

Regarding the message, what about:

printf("Relocating 0x%lx pages from 0x%lx to 0x%lx%lx to make room for a
larger MMIO hole\n",
       min_t(unsigned int,
             hvm_info->low_mem_pgend - (pci_mem_start >> PAGE_SHIFT),
             (1u << 16) - 1),
        hvm_info->low_mem_pgend,
        hvm_info->high_mem_pgend);

Stefano Stabellini

2013-Jun-20 17:16 UTC

head link

Re: [PATCH v3 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

On Thu, 20 Jun 2013, George Dunlap wrote:> hvmloader will read hvm_info->high_mem_pgend to calculate where to start
the
> highmem PCI  region.  However, if the guest does not have any memory in the
high
> region, this is set to zero, which will cause hvmloader to use the
"0" for the base
> of the highmem region, rather than 1 << 32.
> 
> Check to see whether hvm_info->high_mem_pgend is set; if so, do the
normal calculation;
> otherwise, use 1<<32.
> 
> Signed-off-by: Geore Dunlap <george.dunlap@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Hanweidong <hanweidong@huawei.com>
> CC: Keir Fraser <keir@xen.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

>  tools/firmware/hvmloader/pci.c |    9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c
b/tools/firmware/hvmloader/pci.c
> index d8592b0..f0f11e2 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -244,7 +244,14 @@ void pci_setup(void)
>          hvm_info->high_mem_pgend += nr_pages;
>      }
>  
> -    high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend)
<< PAGE_SHIFT;
> +    if ( hvm_info->high_mem_pgend )
> +        high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend)
<< PAGE_SHIFT;
> +    else
> +        high_mem_resource.base = 1ull << 32;
> +    printf("%sRAM in high memory; setting high_mem resource base to
%x%08x\n",
> +           hvm_info->high_mem_pgend?"":"No ",
> +           (uint32_t)(high_mem_resource.base>>32),
> +           (uint32_t)high_mem_resource.base);
>      high_mem_resource.max = 1ull << cpu_phys_addr();
>      mem_resource.base = pci_mem_start;
>      mem_resource.max = pci_mem_end;
> -- 
> 1.7.9.5
>

Stefano Stabellini

2013-Jun-20 17:20 UTC

head link

Re: [PATCH v3 5/8] hvmloader: Correct bug in low mmio region accounting

On Thu, 20 Jun 2013, George Dunlap wrote:> When deciding whether to map a device in low MMIO space (<4GiB),
> hvmloader compares it with "mmio_left", which is set to the size
of
> the low MMIO range (pci_mem_end - pci_mem_start).  However, even if it
> does map a device in high MMIO space, it still removes the size of its
> BAR from mmio_left.
> 
> In reality we don''t need to do a separate accounting of the low
memory
> available -- this can be calculated from mem_resource.  Just get rid of
> the variable and the duplicate accounting entirely.  This will make the
code
> more robust.
> 
> v3:
>  - Use mem_resource values directly instead of doing duplicate accounting
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Hanweidong <hanweidong@huawei.com>
> CC: Keir Fraser <keir@xen.org>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>  tools/firmware/hvmloader/pci.c |    7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c
b/tools/firmware/hvmloader/pci.c
> index 4e8dc6a..606ccca 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -42,7 +42,6 @@ void pci_setup(void)
>      uint32_t vga_devfn = 256;
>      uint16_t class, vendor_id, device_id;
>      unsigned int bar, pin, link, isa_irq;
> -    int64_t mmio_left;
>  
>      /* Resources assignable to PCI devices via BARs. */
>      struct resource {
> @@ -258,8 +257,6 @@ void pci_setup(void)
>      io_resource.base = 0xc000;
>      io_resource.max = 0x10000;
>  
> -    mmio_left = pci_mem_end - pci_mem_start;
> -
>      /* Assign iomem and ioport resources in descending order of size. */
>      for ( i = 0; i < nr_bars; i++ )
>      {
> @@ -267,7 +264,8 @@ void pci_setup(void)
>          bar_reg = bars[i].bar_reg;
>          bar_sz  = bars[i].bar_sz;
>  
> -        using_64bar = bars[i].is_64bar && bar64_relocate
&& (mmio_left < bar_sz);
> +        using_64bar = bars[i].is_64bar && bar64_relocate
> +            && (bar_sz > (mem_resource.max -
mem_resource.base));
>          bar_data = pci_readl(devfn, bar_reg);
>  
>          if ( (bar_data & PCI_BASE_ADDRESS_SPACE) => @@ -289,7
+287,6 @@ void pci_setup(void)
>                  resource = &mem_resource;
>                  bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
>              }
> -            mmio_left -= bar_sz;
>          }
>          else
>          {
> -- 
> 1.7.9.5
>

Stefano Stabellini

2013-Jun-20 17:23 UTC

head link

Re: [PATCH v3 6/8] hvmloader: Load large devices into high MMIO space as needed

On Thu, 20 Jun 2013, George Dunlap wrote:> Keep track of how much mmio space is left total, as well as the amount
> of "low" MMIO space (<4GiB), and only load devices into high
memory
> if there is not enough low memory for the rest of the devices to fit.
> 
> Because devices are processed by size in order from large to small,
> this should preferentially relocate devices with large BARs to 64-bit
> space.
> 
> v3:
>  - Just use mmio_total rather than introducing a new variable.
>  - Port to using mem_resource directly rather than low_mmio_left
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Hanweidong <hanweidong@huawei.com>
> CC: Keir Fraser <keir@xen.org>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>  tools/firmware/hvmloader/pci.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c
b/tools/firmware/hvmloader/pci.c
> index 606ccca..ac55d3e 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -264,8 +264,12 @@ void pci_setup(void)
>          bar_reg = bars[i].bar_reg;
>          bar_sz  = bars[i].bar_sz;
>  
> +        /* Relocate to high memory if the total amount of MMIO needed
> +         * is more than the low MMIO available.  Because devices are
> +         * processed in order of bar_sz, this will preferentially
> +         * relocate larger devices to high memory first. */
>          using_64bar = bars[i].is_64bar && bar64_relocate
> -            && (bar_sz > (mem_resource.max -
mem_resource.base));
> +            && (mmio_total > (mem_resource.max -
mem_resource.base));
>          bar_data = pci_readl(devfn, bar_reg);
>  
>          if ( (bar_data & PCI_BASE_ADDRESS_SPACE) => @@ -287,6
+291,7 @@ void pci_setup(void)
>                  resource = &mem_resource;
>                  bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
>              }
> +            mmio_total -= bar_sz;
>          }
>          else
>          {
> -- 
> 1.7.9.5
>

Stefano Stabellini

2013-Jun-20 17:38 UTC

head link

Re: [PATCH v3 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

On Thu, 20 Jun 2013, George Dunlap wrote:> At the moment, qemu-xen can''t handle memory being relocated by
> hvmloader.  This may happen if a device with a large enough memory
> region is passed through to the guest.  At the moment, if this
> happens, then at some point in the future qemu will crash and the
> domain will hang.  (qemu-traditional is fine.)
> 
> It''s too late in the release to do a proper fix, so we try to do
> damage control.
> 
> hvmloader already has mechanisms to relocate memory to 64-bit space
> if it can''t make a big enough MMIO hole.  By default this is 2GiB;
if
> we just refuse to make the hole bigger if it will overlap with guest
> memory, then the relocation will happen by default.
> 
> v3:
>  - Fix polarity of comparison
>  - Move diagnostic messages to another patch
>  - Tested with xen platform pci device hacked to have different BAR sizes
>    {256MiB, 1GiB} x {qemu-xen, qemu-traditional} x various memory
>    configurations
>  - Add comment explaining why we default to "allow"
>  - Remove cast to bool
> v2:
>  - style fixes
>  - fix and expand comment on the MMIO hole loop
>  - use "%d" rather than "%s" ->
(...)?"1":"0"
>  - use bool instead of uint8_t
>  - Move 64-bit bar relocate detection to another patch
>  - Add more diagnostic messages
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Hanweidong <hanweidong@huawei.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Keir Fraser <keir@xen.org>
> ---
>  tools/firmware/hvmloader/pci.c          |   49
+++++++++++++++++++++++++++++--
>  tools/libxl/libxl_dm.c                  |    6 ++++
>  xen/include/public/hvm/hvm_xs_strings.h |    1 +
>  3 files changed, 54 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c
b/tools/firmware/hvmloader/pci.c
> index 3108c8a..2364177 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -27,6 +27,8 @@
>  
>  #include <xen/memory.h>
>  #include <xen/hvm/ioreq.h>
> +#include <xen/hvm/hvm_xs_strings.h>
> +#include <stdbool.h>
>  
>  unsigned long pci_mem_start = PCI_MEM_START;
>  unsigned long pci_mem_end = PCI_MEM_END;
> @@ -57,6 +59,32 @@ void pci_setup(void)
>      } *bars = (struct bars *)scratch_start;
>      unsigned int i, nr_bars = 0;
>  
> +    const char *s;
> +    /*
> +     * Do we allow hvmloader to relocate guest memory in order to
> +     * increase the size of the lowmem MMIO hole?  Defaulting to 1
> +     * here will mean that non-libxl toolstacks (including xend and
> +     * home-grown ones) will experience this series as "no
change".
> +     * It does mean that those using qemu-xen will still experience
> +     * the bug (described below); but it also means that those using
> +     * qemu-traditional will *not* experience any change; and it also
> +     * means that there is a work-around for those using qemu-xen,
> +     * namely switching to qemu-traditional.
> +     *
> +     * If we defaulted to 0, and failing to resize the hole caused any
> +     * problems with qemu-traditional, then there is no work-around.
> +     *
> +     * Since xend can''t talk to qemu-traditional, I think this is
the
> +     * option that will have the least impact.
> +     */
> +    bool allow_memory_relocate = 1;
> +
> +    s = xenstore_read(HVM_XS_ALLOW_MEMORY_RELOCATE, NULL);
> +    if ( s )
> +        allow_memory_relocate = strtoll(s, NULL, 0);
Considering that strtoll retuns a long long, are we sure that this
allocation does what we want for all the possible long long values
that can be returned?

For example, if strtoll returns -1, do we want allow_memory_relocate to
be set to true?


> +    printf("Relocating guest memory for lowmem MMIO space %s\n",
> +          
allow_memory_relocate?"enabled":"disabled");
> +
>      /* Program PCI-ISA bridge with appropriate link routes. */
>      isa_irq = 0;
>      for ( link = 0; link < 4; link++ )
> @@ -208,8 +236,25 @@ void pci_setup(void)
>          pci_writew(devfn, PCI_COMMAND, cmd);
>      }
>  
> -    while ( (mmio_total > (pci_mem_end - pci_mem_start)) &&
> -            ((pci_mem_start << 1) != 0) )
> +    /*
> +     * At the moment qemu-xen can''t deal with relocated memory
regions.
> +     * It''s too close to the release to make a proper fix; for
now,
> +     * only allow the MMIO hole to grow large enough to move guest memory
> +     * if we''re running qemu-traditional.  Items that
don''t fit will be
> +     * relocated into the 64-bit address space.
> +     *
> +     * This loop now does the following:
> +     * - If allow_memory_relocate, increase the MMIO hole until
it''s
> +     *   big enough, or until it''s 2GiB
> +     * - If !allow_memory_relocate, increase the MMIO hole until
it''s
> +     *   big enough, or until it''s 2GiB, or until it overlaps
guest
> +     *   memory
> +     */
> +    while ( (mmio_total > (pci_mem_end - pci_mem_start)) 
> +            && ((pci_mem_start << 1) != 0)
> +            && (allow_memory_relocate
> +                || (((pci_mem_start << 1) >> PAGE_SHIFT)
> +                    >= hvm_info->low_mem_pgend)) )
>          pci_mem_start <<= 1;
>  
>      if ( mmio_total > (pci_mem_end - pci_mem_start) ) {
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index ac1f90e..342d2ce 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -1154,6 +1154,12 @@ void libxl__spawn_local_dm(libxl__egc *egc,
libxl__dm_spawn_state *dmss)
>          libxl__xs_write(gc, XBT_NULL,
>                          libxl__sprintf(gc, "%s/hvmloader/bios",
path),
>                          "%s",
libxl_bios_type_to_string(b_info->u.hvm.bios));
> +        /* Disable relocating memory to make the MMIO hole larger
> +         * unless we''re running qemu-traditional */
> +        libxl__xs_write(gc, XBT_NULL,
> +                        libxl__sprintf(gc,
"%s/hvmloader/allow-memory-relocate", path),
> +                        "%d",
> +                       
b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL);
line length
>          free(path);
>      }
>  
> diff --git a/xen/include/public/hvm/hvm_xs_strings.h
b/xen/include/public/hvm/hvm_xs_strings.h
> index 9042303..4de5881 100644
> --- a/xen/include/public/hvm/hvm_xs_strings.h
> +++ b/xen/include/public/hvm/hvm_xs_strings.h
> @@ -28,6 +28,7 @@
>  #define HVM_XS_HVMLOADER               "hvmloader"
>  #define HVM_XS_BIOS                    "hvmloader/bios"
>  #define HVM_XS_GENERATION_ID_ADDRESS  
"hvmloader/generation-id-address"
> +#define HVM_XS_ALLOW_MEMORY_RELOCATE  
"hvmloader/allow-memory-relocate"
>  
>  /* The following values allow additional ACPI tables to be added to the
>   * virtual ACPI BIOS that hvmloader constructs. The values specify the
guest
> -- 
> 1.7.9.5
>

Jan Beulich

2013-Jun-21 07:01 UTC

head link

Re: [PATCH v3 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

>>> On 20.06.13 at 18:33, George Dunlap
<george.dunlap@eu.citrix.com> wrote:
> @@ -244,7 +244,14 @@ void pci_setup(void)
>          hvm_info->high_mem_pgend += nr_pages;
>      }
>  
> -    high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend)
<< PAGE_SHIFT;
> 
> +    if ( hvm_info->high_mem_pgend )
To be on the safe side I''d make this

    if ( hvm_info->high_mem_pgend >= (1ull << 32) )
> +        high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend)
<< PAGE_SHIFT;
> +    else
> +        high_mem_resource.base = 1ull << 32;
> +    printf("%sRAM in high memory; setting high_mem resource base to
%x%08x\n",
> +           hvm_info->high_mem_pgend?"":"No ",
> +           (uint32_t)(high_mem_resource.base>>32),
> +           (uint32_t)high_mem_resource.base);
Seeing the n-th incarnation of this - mind creating a macro to do the
splitting?

Jan
>      high_mem_resource.max = 1ull << cpu_phys_addr();
>      mem_resource.base = pci_mem_start;
>      mem_resource.max = pci_mem_end;

Jan Beulich

2013-Jun-21 07:09 UTC

head link

Re: [PATCH v3 5/8] hvmloader: Correct bug in low mmio region accounting

>>> On 20.06.13 at 18:33, George Dunlap
<george.dunlap@eu.citrix.com> wrote:
> When deciding whether to map a device in low MMIO space (<4GiB),
> hvmloader compares it with "mmio_left", which is set to the size
of
> the low MMIO range (pci_mem_end - pci_mem_start).  However, even if it
> does map a device in high MMIO space, it still removes the size of its
> BAR from mmio_left.
> 
> In reality we don''t need to do a separate accounting of the low
memory
> available -- this can be calculated from mem_resource.  Just get rid of
> the variable and the duplicate accounting entirely.  This will make the
code
> more robust.
> 
> v3:
>  - Use mem_resource values directly instead of doing duplicate accounting
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Hanweidong <hanweidong@huawei.com>
> CC: Keir Fraser <keir@xen.org>
> ---
>  tools/firmware/hvmloader/pci.c |    7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c
b/tools/firmware/hvmloader/pci.c
> index 4e8dc6a..606ccca 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -42,7 +42,6 @@ void pci_setup(void)
>      uint32_t vga_devfn = 256;
>      uint16_t class, vendor_id, device_id;
>      unsigned int bar, pin, link, isa_irq;
> -    int64_t mmio_left;
>  
>      /* Resources assignable to PCI devices via BARs. */
>      struct resource {
> @@ -258,8 +257,6 @@ void pci_setup(void)
>      io_resource.base = 0xc000;
>      io_resource.max = 0x10000;
>  
> -    mmio_left = pci_mem_end - pci_mem_start;
> -
>      /* Assign iomem and ioport resources in descending order of size. */
>      for ( i = 0; i < nr_bars; i++ )
>      {
> @@ -267,7 +264,8 @@ void pci_setup(void)
>          bar_reg = bars[i].bar_reg;
>          bar_sz  = bars[i].bar_sz;
>  
> -        using_64bar = bars[i].is_64bar && bar64_relocate
&& (mmio_left < bar_sz);
> +        using_64bar = bars[i].is_64bar && bar64_relocate
> +            && (bar_sz > (mem_resource.max -
mem_resource.base));
>          bar_data = pci_readl(devfn, bar_reg);
>  
>          if ( (bar_data & PCI_BASE_ADDRESS_SPACE) => @@ -289,7
+287,6 @@ void pci_setup(void)
>                  resource = &mem_resource;
>                  bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
>              }
> -            mmio_left -= bar_sz;
>          }
>          else
>          {
> -- 
> 1.7.9.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel

Jan Beulich

2013-Jun-21 07:12 UTC

head link

Re: [PATCH v3 6/8] hvmloader: Load large devices into high MMIO space as needed

>>> On 20.06.13 at 18:33, George Dunlap
<george.dunlap@eu.citrix.com> wrote:
> Keep track of how much mmio space is left total, as well as the amount
> of "low" MMIO space (<4GiB), and only load devices into high
memory
> if there is not enough low memory for the rest of the devices to fit.
> 
> Because devices are processed by size in order from large to small,
> this should preferentially relocate devices with large BARs to 64-bit
> space.
> 
> v3:
>  - Just use mmio_total rather than introducing a new variable.
>  - Port to using mem_resource directly rather than low_mmio_left
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Hanweidong <hanweidong@huawei.com>
> CC: Keir Fraser <keir@xen.org>
> ---
>  tools/firmware/hvmloader/pci.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c
b/tools/firmware/hvmloader/pci.c
> index 606ccca..ac55d3e 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -264,8 +264,12 @@ void pci_setup(void)
>          bar_reg = bars[i].bar_reg;
>          bar_sz  = bars[i].bar_sz;
>  
> +        /* Relocate to high memory if the total amount of MMIO needed
> +         * is more than the low MMIO available.  Because devices are
> +         * processed in order of bar_sz, this will preferentially
> +         * relocate larger devices to high memory first. */
>          using_64bar = bars[i].is_64bar && bar64_relocate
> -            && (bar_sz > (mem_resource.max -
mem_resource.base));
> +            && (mmio_total > (mem_resource.max -
mem_resource.base));
>          bar_data = pci_readl(devfn, bar_reg);
>  
>          if ( (bar_data & PCI_BASE_ADDRESS_SPACE) => @@ -287,6
+291,7 @@ void pci_setup(void)
>                  resource = &mem_resource;
>                  bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
>              }
> +            mmio_total -= bar_sz;
>          }
>          else
>          {
> -- 
> 1.7.9.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel

Jan Beulich

2013-Jun-21 07:17 UTC

head link

Re: [PATCH v3 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

>>> On 20.06.13 at 18:33, George Dunlap
<george.dunlap@eu.citrix.com> wrote:
> @@ -57,6 +59,32 @@ void pci_setup(void)
>      } *bars = (struct bars *)scratch_start;
>      unsigned int i, nr_bars = 0;
>  
> +    const char *s;
> +    /*
> +     * Do we allow hvmloader to relocate guest memory in order to
> +     * increase the size of the lowmem MMIO hole?  Defaulting to 1
> +     * here will mean that non-libxl toolstacks (including xend and
> +     * home-grown ones) will experience this series as "no
change".
> +     * It does mean that those using qemu-xen will still experience
> +     * the bug (described below); but it also means that those using
> +     * qemu-traditional will *not* experience any change; and it also
> +     * means that there is a work-around for those using qemu-xen,
> +     * namely switching to qemu-traditional.
> +     *
> +     * If we defaulted to 0, and failing to resize the hole caused any
> +     * problems with qemu-traditional, then there is no work-around.
> +     *
> +     * Since xend can''t talk to qemu-traditional, I think this is
the
qemu-xen?
> +     * option that will have the least impact.
> +     */
> +    bool allow_memory_relocate = 1;
> +
> +    s = xenstore_read(HVM_XS_ALLOW_MEMORY_RELOCATE, NULL);
> +    if ( s )
> +        allow_memory_relocate = strtoll(s, NULL, 0);
> +    printf("Relocating guest memory for lowmem MMIO space %s\n",
> +          
allow_memory_relocate?"enabled":"disabled");
> +
>      /* Program PCI-ISA bridge with appropriate link routes. */
>      isa_irq = 0;
>      for ( link = 0; link < 4; link++ )
> @@ -208,8 +236,25 @@ void pci_setup(void)
>          pci_writew(devfn, PCI_COMMAND, cmd);
>      }
>  
> -    while ( (mmio_total > (pci_mem_end - pci_mem_start)) &&
> -            ((pci_mem_start << 1) != 0) )
> +    /*
> +     * At the moment qemu-xen can''t deal with relocated memory
regions.
> +     * It''s too close to the release to make a proper fix; for
now,
> +     * only allow the MMIO hole to grow large enough to move guest memory
> +     * if we''re running qemu-traditional.  Items that
don''t fit will be
> +     * relocated into the 64-bit address space.
> +     *
> +     * This loop now does the following:
> +     * - If allow_memory_relocate, increase the MMIO hole until
it''s
> +     *   big enough, or until it''s 2GiB
> +     * - If !allow_memory_relocate, increase the MMIO hole until
it''s
> +     *   big enough, or until it''s 2GiB, or until it overlaps
guest
> +     *   memory
> +     */
> +    while ( (mmio_total > (pci_mem_end - pci_mem_start)) 
> +            && ((pci_mem_start << 1) != 0)
> +            && (allow_memory_relocate
> +                || (((pci_mem_start << 1) >> PAGE_SHIFT)
> +                    >= hvm_info->low_mem_pgend)) )
>          pci_mem_start <<= 1;
>  
>      if ( mmio_total > (pci_mem_end - pci_mem_start) ) {
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index ac1f90e..342d2ce 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -1154,6 +1154,12 @@ void libxl__spawn_local_dm(libxl__egc *egc, 
> libxl__dm_spawn_state *dmss)
>          libxl__xs_write(gc, XBT_NULL,
>                          libxl__sprintf(gc, "%s/hvmloader/bios",
path),
>                          "%s",
libxl_bios_type_to_string(b_info->u.hvm.bios));
> +        /* Disable relocating memory to make the MMIO hole larger
> +         * unless we''re running qemu-traditional */
> +        libxl__xs_write(gc, XBT_NULL,
> +                        libxl__sprintf(gc, 
> "%s/hvmloader/allow-memory-relocate", path),
> +                        "%d",
> +                        
>
b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONA
> L);
>          free(path);
>      }
>  
> diff --git a/xen/include/public/hvm/hvm_xs_strings.h 
> b/xen/include/public/hvm/hvm_xs_strings.h
> index 9042303..4de5881 100644
> --- a/xen/include/public/hvm/hvm_xs_strings.h
> +++ b/xen/include/public/hvm/hvm_xs_strings.h
> @@ -28,6 +28,7 @@
>  #define HVM_XS_HVMLOADER               "hvmloader"
>  #define HVM_XS_BIOS                    "hvmloader/bios"
>  #define HVM_XS_GENERATION_ID_ADDRESS  
"hvmloader/generation-id-address"
> +#define HVM_XS_ALLOW_MEMORY_RELOCATE  
"hvmloader/allow-memory-relocate"
>  
>  /* The following values allow additional ACPI tables to be added to the
>   * virtual ACPI BIOS that hvmloader constructs. The values specify the 
> guest
> -- 
> 1.7.9.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org 
> http://lists.xen.org/xen-devel

George Dunlap

2013-Jun-21 08:31 UTC

head link

Re: [PATCH v3 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

On 20/06/13 18:38, Stefano Stabellini wrote:> On Thu, 20 Jun 2013, George Dunlap wrote:
>> At the moment, qemu-xen can''t handle memory being relocated by
>> hvmloader.  This may happen if a device with a large enough memory
>> region is passed through to the guest.  At the moment, if this
>> happens, then at some point in the future qemu will crash and the
>> domain will hang.  (qemu-traditional is fine.)
>>
>> It''s too late in the release to do a proper fix, so we try to
do
>> damage control.
>>
>> hvmloader already has mechanisms to relocate memory to 64-bit space
>> if it can''t make a big enough MMIO hole.  By default this is
2GiB; if
>> we just refuse to make the hole bigger if it will overlap with guest
>> memory, then the relocation will happen by default.
>>
>> v3:
>>   - Fix polarity of comparison
>>   - Move diagnostic messages to another patch
>>   - Tested with xen platform pci device hacked to have different BAR
sizes
>>     {256MiB, 1GiB} x {qemu-xen, qemu-traditional} x various memory
>>     configurations
>>   - Add comment explaining why we default to "allow"
>>   - Remove cast to bool
>> v2:
>>   - style fixes
>>   - fix and expand comment on the MMIO hole loop
>>   - use "%d" rather than "%s" ->
(...)?"1":"0"
>>   - use bool instead of uint8_t
>>   - Move 64-bit bar relocate detection to another patch
>>   - Add more diagnostic messages
>>
>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>> CC: Ian Campbell <ian.campbell@citrix.com>
>> CC: Ian Jackson <ian.jackson@citrix.com>
>> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
>> CC: Hanweidong <hanweidong@huawei.com>
>> CC: Keir Fraser <keir@xen.org>
>> CC: Keir Fraser <keir@xen.org>
>> ---
>>   tools/firmware/hvmloader/pci.c          |   49
+++++++++++++++++++++++++++++--
>>   tools/libxl/libxl_dm.c                  |    6 ++++
>>   xen/include/public/hvm/hvm_xs_strings.h |    1 +
>>   3 files changed, 54 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/firmware/hvmloader/pci.c
b/tools/firmware/hvmloader/pci.c
>> index 3108c8a..2364177 100644
>> --- a/tools/firmware/hvmloader/pci.c
>> +++ b/tools/firmware/hvmloader/pci.c
>> @@ -27,6 +27,8 @@
>>   
>>   #include <xen/memory.h>
>>   #include <xen/hvm/ioreq.h>
>> +#include <xen/hvm/hvm_xs_strings.h>
>> +#include <stdbool.h>
>>   
>>   unsigned long pci_mem_start = PCI_MEM_START;
>>   unsigned long pci_mem_end = PCI_MEM_END;
>> @@ -57,6 +59,32 @@ void pci_setup(void)
>>       } *bars = (struct bars *)scratch_start;
>>       unsigned int i, nr_bars = 0;
>>   
>> +    const char *s;
>> +    /*
>> +     * Do we allow hvmloader to relocate guest memory in order to
>> +     * increase the size of the lowmem MMIO hole?  Defaulting to 1
>> +     * here will mean that non-libxl toolstacks (including xend and
>> +     * home-grown ones) will experience this series as "no
change".
>> +     * It does mean that those using qemu-xen will still experience
>> +     * the bug (described below); but it also means that those using
>> +     * qemu-traditional will *not* experience any change; and it also
>> +     * means that there is a work-around for those using qemu-xen,
>> +     * namely switching to qemu-traditional.
>> +     *
>> +     * If we defaulted to 0, and failing to resize the hole caused any
>> +     * problems with qemu-traditional, then there is no work-around.
>> +     *
>> +     * Since xend can''t talk to qemu-traditional, I think
this is the
>> +     * option that will have the least impact.
>> +     */
>> +    bool allow_memory_relocate = 1;
>> +
>> +    s = xenstore_read(HVM_XS_ALLOW_MEMORY_RELOCATE, NULL);
>> +    if ( s )
>> +        allow_memory_relocate = strtoll(s, NULL, 0);
> Considering that strtoll retuns a long long, are we sure that this
> allocation does what we want for all the possible long long values
> that can be returned?
>
> For example, if strtoll returns -1, do we want allow_memory_relocate to
> be set to true?
The only valid values here are "0" and "1"; everything else
is undefined.

Look, the bike shed is already painted, the brushes have been washed and 
put away.  Leave it be. :-)
>
>
>
>> +    printf("Relocating guest memory for lowmem MMIO space
%s\n",
>> +          
allow_memory_relocate?"enabled":"disabled");
>> +
>>       /* Program PCI-ISA bridge with appropriate link routes. */
>>       isa_irq = 0;
>>       for ( link = 0; link < 4; link++ )
>> @@ -208,8 +236,25 @@ void pci_setup(void)
>>           pci_writew(devfn, PCI_COMMAND, cmd);
>>       }
>>   
>> -    while ( (mmio_total > (pci_mem_end - pci_mem_start)) &&
>> -            ((pci_mem_start << 1) != 0) )
>> +    /*
>> +     * At the moment qemu-xen can''t deal with relocated
memory regions.
>> +     * It''s too close to the release to make a proper fix;
for now,
>> +     * only allow the MMIO hole to grow large enough to move guest
memory
>> +     * if we''re running qemu-traditional.  Items that
don''t fit will be
>> +     * relocated into the 64-bit address space.
>> +     *
>> +     * This loop now does the following:
>> +     * - If allow_memory_relocate, increase the MMIO hole until
it''s
>> +     *   big enough, or until it''s 2GiB
>> +     * - If !allow_memory_relocate, increase the MMIO hole until
it''s
>> +     *   big enough, or until it''s 2GiB, or until it overlaps
guest
>> +     *   memory
>> +     */
>> +    while ( (mmio_total > (pci_mem_end - pci_mem_start))
>> +            && ((pci_mem_start << 1) != 0)
>> +            && (allow_memory_relocate
>> +                || (((pci_mem_start << 1) >> PAGE_SHIFT)
>> +                    >= hvm_info->low_mem_pgend)) )
>>           pci_mem_start <<= 1;
>>   
>>       if ( mmio_total > (pci_mem_end - pci_mem_start) ) {
>> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
>> index ac1f90e..342d2ce 100644
>> --- a/tools/libxl/libxl_dm.c
>> +++ b/tools/libxl/libxl_dm.c
>> @@ -1154,6 +1154,12 @@ void libxl__spawn_local_dm(libxl__egc *egc,
libxl__dm_spawn_state *dmss)
>>           libxl__xs_write(gc, XBT_NULL,
>>                           libxl__sprintf(gc,
"%s/hvmloader/bios", path),
>>                           "%s",
libxl_bios_type_to_string(b_info->u.hvm.bios));
>> +        /* Disable relocating memory to make the MMIO hole larger
>> +         * unless we''re running qemu-traditional */
>> +        libxl__xs_write(gc, XBT_NULL,
>> +                        libxl__sprintf(gc,
"%s/hvmloader/allow-memory-relocate", path),
>> +                        "%d",
>> +                       
b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL);
> line length
Blame the person who made a *47-character* identifier.

Seriously, would you rather have the "==" on a separate line or 
something? That seems really ridiculous.

I''ll break the sprintf though.

  -George

George Dunlap

2013-Jun-21 08:32 UTC

head link

Re: [PATCH v3 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

On 21/06/13 08:17, Jan Beulich wrote:>>>> On 20.06.13 at 18:33, George Dunlap
<george.dunlap@eu.citrix.com> wrote:
>> @@ -57,6 +59,32 @@ void pci_setup(void)
>>       } *bars = (struct bars *)scratch_start;
>>       unsigned int i, nr_bars = 0;
>>   
>> +    const char *s;
>> +    /*
>> +     * Do we allow hvmloader to relocate guest memory in order to
>> +     * increase the size of the lowmem MMIO hole?  Defaulting to 1
>> +     * here will mean that non-libxl toolstacks (including xend and
>> +     * home-grown ones) will experience this series as "no
change".
>> +     * It does mean that those using qemu-xen will still experience
>> +     * the bug (described below); but it also means that those using
>> +     * qemu-traditional will *not* experience any change; and it also
>> +     * means that there is a work-around for those using qemu-xen,
>> +     * namely switching to qemu-traditional.
>> +     *
>> +     * If we defaulted to 0, and failing to resize the hole caused any
>> +     * problems with qemu-traditional, then there is no work-around.
>> +     *
>> +     * Since xend can''t talk to qemu-traditional, I think
this is the
> qemu-xen?
Oops, good catch. :-)

  -George

George Dunlap

2013-Jun-21 08:34 UTC

head link

Re: [PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments

On 20/06/13 18:06, Stefano Stabellini wrote:> On Thu, 20 Jun 2013, George Dunlap wrote:
>> The printf() available to hvmloader does not handle 64-bit data types;
>> manually break them down as two 32-bit strings.
>>
>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>> CC: Ian Campbell <ian.campbell@citrix.com>
>> CC: Ian Jackson <ian.jackson@citrix.com>
>> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
>> CC: Hanweidong <hanweidong@huawei.com>
>> CC: Keir Fraser <keir@xen.org>
> What''s the issue with implementing %llx?
It involves implementing __udiv64(), since hvmloader runs in 32-bit mode.

  -George

George Dunlap

2013-Jun-21 08:37 UTC

head link

Re: [PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments

On 20/06/13 18:09, Ian Jackson wrote:> Stefano Stabellini writes ("Re: [PATCH v3 1/8] hvmloader: Remove all
64-bit print arguments"):
>> What''s the issue with implementing %llx?
> Isn''t this an endless series of just-before-release yaks ?
>
> If it were me I would have left the %llx alone.
I needed to print the 64-bit values for diagnostic purposes, 
particularly when testing boundary conditions; that''s why I did the 
manual break-down.

Changing the faked-up printf to handle uint64_t wasn''t too difficult, 
but it required implementing a 64-bit integer division, which was more 
than I was up for.

  -George

George Dunlap

2013-Jun-21 09:06 UTC

head link

Re: [PATCH v3 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

On 21/06/13 08:01, Jan Beulich wrote:>>>> On 20.06.13 at 18:33, George Dunlap
<george.dunlap@eu.citrix.com> wrote:
>> @@ -244,7 +244,14 @@ void pci_setup(void)
>>           hvm_info->high_mem_pgend += nr_pages;
>>       }
>>   
>> -    high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend)
<< PAGE_SHIFT;
>>
>> +    if ( hvm_info->high_mem_pgend )
> To be on the safe side I''d make this
>
>      if ( hvm_info->high_mem_pgend >= (1ull << 32) )
high_mem_pgend is in frames, not physical address; but I do take your point.

On the other hand, there is other code (for instance, in the memory 
relocation loop) that assumes that high_mem_pgend is either 0, or 
pointing into the high mem region.  Normally the correct way to deal 
with internal invariants like this is to add an assert() to that effect, 
but adding a new assert() at this point is not really on either.

I''d just as soon leave it the way it is, but if people felt strongly 
about it I could change it to something like this:

   high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend) <<
PAGE_SHIFT;
   if ( high_mem_resource.base < 1ull << 32 )
     high_mem_resource.base = 1ull << 32;

>> +        high_mem_resource.base =
((uint64_t)hvm_info->high_mem_pgend) << PAGE_SHIFT;
>> +    else
>> +        high_mem_resource.base = 1ull << 32;
>> +    printf("%sRAM in high memory; setting high_mem resource base
to %x%08x\n",
>> +           hvm_info->high_mem_pgend?"":"No ",
>> +           (uint32_t)(high_mem_resource.base>>32),
>> +           (uint32_t)high_mem_resource.base);
> Seeing the n-th incarnation of this - mind creating a macro to do the
> splitting?
I''ll give it a try and see how it looks.

  -George

George Dunlap

2013-Jun-21 09:22 UTC

head link

Re: [PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments

On 21/06/13 09:34, George Dunlap wrote:> On 20/06/13 18:06, Stefano Stabellini wrote:
>> On Thu, 20 Jun 2013, George Dunlap wrote:
>>> The printf() available to hvmloader does not handle 64-bit data
types;
>>> manually break them down as two 32-bit strings.
>>>
>>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>>> CC: Ian Campbell <ian.campbell@citrix.com>
>>> CC: Ian Jackson <ian.jackson@citrix.com>
>>> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
>>> CC: Hanweidong <hanweidong@huawei.com>
>>> CC: Keir Fraser <keir@xen.org>
>> What''s the issue with implementing %llx?
>
> It involves implementing __udiv64(), since hvmloader runs in 32-bit mode.
Actually, it looks like hvmloader already does have a "divide 64-bit by 
32-bit" function I could use; but that would be a more significant 
change in functionality in the paths that are run by basically everyone, 
rather than minimal changes for people not affected by the bug and major 
changes only for people who are going to crash anyway.

  -George

George Dunlap

2013-Jun-21 09:35 UTC

head link

Re: [PATCH v3 2/8] hvmloader: Make the printfs more informative

On 20/06/13 18:12, Stefano Stabellini wrote:> On Thu, 20 Jun 2013, George Dunlap wrote:
>> * Warn that you''re relocating some BARs to 64-bit
>>
>> * Warn that you''re relocating guest pages, and how many
>>
>> * Include upper 32-bits of the base register when printing the bar
>>    placement info
>>
>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>> CC: Ian Campbell <ian.campbell@citrix.com>
>> CC: Ian Jackson <ian.jackson@citrix.com>
>> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
>> CC: Hanweidong <hanweidong@huawei.com>
>> CC: Keir Fraser <keir@xen.org>
>> ---
>>   tools/firmware/hvmloader/pci.c |   13 ++++++++++---
>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/firmware/hvmloader/pci.c
b/tools/firmware/hvmloader/pci.c
>> index aa54bc1..d8592b0 100644
>> --- a/tools/firmware/hvmloader/pci.c
>> +++ b/tools/firmware/hvmloader/pci.c
>> @@ -213,10 +213,17 @@ void pci_setup(void)
>>               ((pci_mem_start << 1) != 0) )
>>           pci_mem_start <<= 1;
>>   
>> -    if ( (pci_mem_start << 1) != 0 )
>> +    if ( (pci_mem_start << 1) != 0 ) {
>> +        printf("Low MMIO hole not large enough for all
devices,"
>> +               " relocating some BARs to 64-bit\n");
>>           bar64_relocate = 1;
>> +    }
>>   
>>       /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
>> +    if ( (pci_mem_start >> PAGE_SHIFT) <
hvm_info->low_mem_pgend )
>> +        printf("Relocating 0x%lx pages to highmem for lowmem MMIO
hole\n",
>> +               hvm_info->low_mem_pgend - (pci_mem_start >>
PAGE_SHIFT));
> Shouldn''t this be:
>
> min_t(unsigned int,
>        hvm_info->low_mem_pgend - (pci_mem_start >> PAGE_SHIFT),
>        (1u << 16) - 1);
>
> to match exactly what we do in the relocation code?
No; the relocation is done in a loop which will run until the condition 
in the if above is satisfied.

We could, I suppose, do the printf on each iteration of the loop; if
I''m
doing the math right*, the maximum iterations around the loop should be 
8, and a typical number would be just 1 or 2.

* Maximum MMIO size: 2GiB == 1<<31.  In 4-k pages, that''s
1<<(31-12) ==
1<<19.  This will do a batch of 1<<16 pages at a time, leaving
1<<3
iterations maximum, or 8.  (1<<16 pages is 1<<(16+12) or 1<<28
bytes, or
1<<8 == 256 megabytes moved at a time.)
>
> Regarding the message, what about:
>
> printf("Relocating 0x%lx pages from 0x%lx to 0x%lx%lx to make room for
a larger MMIO hole\n",
>         min_t(unsigned int,
>               hvm_info->low_mem_pgend - (pci_mem_start >>
PAGE_SHIFT),
>               (1u << 16) - 1),
>          hvm_info->low_mem_pgend,
>          hvm_info->high_mem_pgend);
Lemme see how it looks...

  -George

George Dunlap

2013-Jun-21 09:50 UTC

head link

Re: [PATCH v3 2/8] hvmloader: Make the printfs more informative

On 21/06/13 10:35, George Dunlap wrote:> On 20/06/13 18:12, Stefano Stabellini wrote:
>> On Thu, 20 Jun 2013, George Dunlap wrote:
>>> * Warn that you''re relocating some BARs to 64-bit
>>>
>>> * Warn that you''re relocating guest pages, and how many
>>>
>>> * Include upper 32-bits of the base register when printing the bar
>>>    placement info
>>>
>>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>>> CC: Ian Campbell <ian.campbell@citrix.com>
>>> CC: Ian Jackson <ian.jackson@citrix.com>
>>> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
>>> CC: Hanweidong <hanweidong@huawei.com>
>>> CC: Keir Fraser <keir@xen.org>
>>> ---
>>>   tools/firmware/hvmloader/pci.c |   13 ++++++++++---
>>>   1 file changed, 10 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/tools/firmware/hvmloader/pci.c 
>>> b/tools/firmware/hvmloader/pci.c
>>> index aa54bc1..d8592b0 100644
>>> --- a/tools/firmware/hvmloader/pci.c
>>> +++ b/tools/firmware/hvmloader/pci.c
>>> @@ -213,10 +213,17 @@ void pci_setup(void)
>>>               ((pci_mem_start << 1) != 0) )
>>>           pci_mem_start <<= 1;
>>>   -    if ( (pci_mem_start << 1) != 0 )
>>> +    if ( (pci_mem_start << 1) != 0 ) {
>>> +        printf("Low MMIO hole not large enough for all
devices,"
>>> +               " relocating some BARs to 64-bit\n");
>>>           bar64_relocate = 1;
>>> +    }
>>>         /* Relocate RAM that overlaps PCI space (in 64k-page 
>>> chunks). */
>>> +    if ( (pci_mem_start >> PAGE_SHIFT) <
hvm_info->low_mem_pgend )
>>> +        printf("Relocating 0x%lx pages to highmem for lowmem
MMIO
>>> hole\n",
>>> +               hvm_info->low_mem_pgend - (pci_mem_start
>>
>>> PAGE_SHIFT));
>> Shouldn''t this be:
>>
>> min_t(unsigned int,
>>        hvm_info->low_mem_pgend - (pci_mem_start >>
PAGE_SHIFT),
>>        (1u << 16) - 1);
>>
>> to match exactly what we do in the relocation code?
>
> No; the relocation is done in a loop which will run until the 
> condition in the if above is satisfied.
>
> We could, I suppose, do the printf on each iteration of the loop; if 
> I''m doing the math right*, the maximum iterations around the loop 
> should be 8, and a typical number would be just 1 or 2.
>
> * Maximum MMIO size: 2GiB == 1<<31.  In 4-k pages, that''s
1<<(31-12)
> == 1<<19.  This will do a batch of 1<<16 pages at a time,
leaving 1<<3
> iterations maximum, or 8.  (1<<16 pages is 1<<(16+12) or
1<<28 bytes,
> or 1<<8 == 256 megabytes moved at a time.)
>
>>
>> Regarding the message, what about:
>>
>> printf("Relocating 0x%lx pages from 0x%lx to 0x%lx%lx to make room
>> for a larger MMIO hole\n",
>>         min_t(unsigned int,
>>               hvm_info->low_mem_pgend - (pci_mem_start >>
PAGE_SHIFT),
>>               (1u << 16) - 1),
>>          hvm_info->low_mem_pgend,
>>          hvm_info->high_mem_pgend);
>
> Lemme see how it looks...
If you have a 2GiB MMIO hole, it looks like this:

(XEN) HVM42: Relocating 0xffff pages from 0e0001000 to 20f800000 for 
lowmem MMIO hole
(XEN) HVM42: Relocating 0xffff pages from 0d0002000 to 21f7ff000 for 
lowmem MMIO hole
(XEN) HVM42: Relocating 0xffff pages from 0c0003000 to 22f7fe000 for 
lowmem MMIO hole
(XEN) HVM42: Relocating 0xffff pages from 0b0004000 to 23f7fd000 for 
lowmem MMIO hole
(XEN) HVM42: Relocating 0xffff pages from 0a0005000 to 24f7fc000 for 
lowmem MMIO hole
(XEN) HVM42: Relocating 0xffff pages from 090006000 to 25f7fb000 for 
lowmem MMIO hole
(XEN) HVM42: Relocating 0xffff pages from 080007000 to 26f7fa000 for 
lowmem MMIO hole
(XEN) HVM42: Relocating 0x7 pages from 080000000 to 27f7f9000 for lowmem 
MMIO hole


Kind of ugly; I think I liked it better with just one printf.

  -George

Ian Jackson

2013-Jun-21 10:22 UTC

head link

Re: [PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments [and 1 more messages]

George Dunlap writes ("[Xen-devel] [PATCH 0/8] Relocate devices rather than
memory for qemu-xen"):> This is the third version of a patch series to address the issue of
> qemu-xen not being able to handle moving guest memory in order to
> resize the lowmem MMIO hole.
> 
> This series adds and reloctes several patches.  A brief summary can be
> seen below:
Thanks for the extensive case analysis.  I hope the other reviewers
are reading it too !

Ian.

Ian Jackson

2013-Jun-21 10:23 UTC

head link

Re: [PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments

George Dunlap writes ("[PATCH v3 1/8] hvmloader: Remove all 64-bit print
arguments"):> The printf() available to hvmloader does not handle 64-bit data types;
> manually break them down as two 32-bit strings.
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian Jackson

2013-Jun-21 10:35 UTC

head link

Re: [PATCH v3 2/8] hvmloader: Make the printfs more informative

George Dunlap writes ("[PATCH v3 2/8] hvmloader: Make the printfs more
informative"):> * Warn that you''re relocating some BARs to 64-bit
> 
> * Warn that you''re relocating guest pages, and how many
> 
> * Include upper 32-bits of the base register when printing the bar
>   placement info
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian Jackson

2013-Jun-21 10:38 UTC

head link

Re: [PATCH v3 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

George Dunlap writes ("[PATCH v3 3/8] hvmloader: Set up highmem resouce
appropriately if there is no RAM above 4G"):> hvmloader will read hvm_info->high_mem_pgend to calculate where to start
the
> highmem PCI  region.  However, if the guest does not have any memory in the
h
igh> region, this is set to zero, which will cause hvmloader to use the
"0" for th
e base> of the highmem region, rather than 1 << 32.
> 
> Check to see whether hvm_info->high_mem_pgend is set; if so, do the
normal ca
lculation;> otherwise, use 1<<32.
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian.

George Dunlap

2013-Jun-21 10:46 UTC

head link

[PATCH 0/8] Relocate devices rather than memory for qemu-xen

This is the third version of a patch series to address the issue of
qemu-xen not being able to handle moving guest memory in order to
resize the lowmem MMIO hole.

A brief summary can be seen below:

- 1/8 hvmloader: Remove all 64-bit print arguments
- 2/8 hvmloader: Make the printfs more informative
- 3/8 hvmloader: Set up highmem resouce appropriately if there is no RAM above
4G
A 4/8 hvmloader: Fix check for needing a 64-bit bar
A 5/8 hvmloader: Correct bug in low mmio region accounting
A 6/8 hvmloader: Load large devices into high MMIO space as needed
A 7/8 hvmloader: Remove minimum size for BARs to relocate to 64-bit space
- 8/8 libxl,hvmloader: Don''t relocate memory for MMIO hole

Key 
 -: Changes in v4
 A: Reviewed / Acked by 2 or more people

= The situation = 

The default MMIO hole for Xen systems starts at 0xf0000000; this
leaves just under 256MiB of space for PCI devices.  At the moment,
hvmloader will scan the pci bus for devices and resize this hole, up
to 2GiB (i.e., starting at 0x80000000) to make space, relocating any
overlapping guest memory above 4GiB (0x100000000).  (After that point,
if there is still not enough space, the intention seemed to be that it
would begin mapping devices with 64-bit-capabile BARs into high memory
as well, just above the end of RAM; however, there seems to be a bug
in the code which detects this condition; it is likely that the 64-bit
remapping code was never capable of being triggered.)

We expect the default MMIO hole to be insufficient only when passing
through devices to guests.

This works fine for qemu-traditional, but qemu-xen unfortunately has
expectations of where guest memory will be, and will get confused if
it moves.  If hvmloader does relocate guest RAM, then at some point
qemu will try to map that pfn space, resulting in a seg fault and qemu
crashing.

hvmloader of course will only move RAM if it would overlap the MMIO
region; this means that if the guest has a small enough amount of RAM
-- say, 2GiB -- then this "memory move" condition will also never be
triggered. 

So at the moment, then under the following conditions:
 * A user is passing through a device or set of devices requiring more
 MMIO space than the default MMIO hole
 * The user has enough memory that resizing the hole will overlap
 guest memory
 * The user is using qemu-xen (not qemu-traditoinal)
then the user will shortly after boot experience qemu crashing.

= The proposed fix = 

This patch series makes the following functional changes:

The core change is this:

 * When running qemu-xen, don''t resize the MMIO hole; instead rely on
devices being moved into the 64-bit MMIO region.

In order to make this more effective, we also make the following
changes to the 64-bit relocation code:

 * Allow devices smaller than 512MiB to be relocated to the high MMIO
region.

 * When moving devices into the 64-bit MMIO region, start with the
ones with the largest BARs, and only relocate them if there is not
enough space for all the remaining BARs

= Risk analysis 
There are two kinds of risks: risks due to unintended changes (i.e., a
bug in the patch itself), and risks due to intended changes.

We hope that we can solve the first by a combination of testing and
code review.

The rest of this analysis will assume that the patch is correct, and
will try to do a risk analysis on the effects of the patch.

The main risk is that moving some devices into 64-bit memory will
cause problems with the operation of those devices.  Relocating a
device may have the following outcomes:
 1. In the best case, the relocated device will Just Work.  
 2. A relocated device may fail in a way that leaves the OS intact: the
guest OS may not be able to see them, or the driver may not load.
 3. A relocated device may fail in a way that crashes the guest OS: the
driver may crash, or one of the relocated devices which fails may be
system-critical.
 4. A relocated device may fail in a way which is unpredictable, but
does not cause data loss: crashing the guest randomly at some point in
the future, or causing strange quirks in functionality (e.g.,
network connectivity dropping, glitches when watching video).
 5. A relocated device may fail in a way that is unpredictable, and
corrupts data.

Outcomes 1-3 are equivalent or strictly better than crashing within a
few minutes of boot.  Outcome 4 is arguably also not much worse.

The main risk to our users would be #5.  However:
 - This is definitely a bug in the driver, OS, or the hardware
 - This is a bug that might be seen running on real hardware, or in
KVM (or some other hypervisor)
 - This is not a bug that we would be likely to catch, even if we had
a full development cycle worth of testing.

I think we should therefore not worry about #5, and consider in
general that relocating a device into 64-bit space will be no worse,
and potentially better, than crashing within a few minutes of boot.

There is another risk with this method, which is that a user may end
up passing through a number of devices with NON-64-bit BARs such that
the devices cannot all fit in the default lowmem MMIO region, but also
cannot be remapped above the 64-bit region.  If this is the case, then
some devices will simply not be able to be mapped.  If these
non-mapped devices are system critical, the VM will not boot; if they
are not, then the devices will simply be invisible.  Both of these are
either no worse than, and potentially better than, crashing within a
few minutes of boot.

Starting with all VMs:

Any VM running in PV mode will be unaffected.

Any VM running in HVM mode but not passing through devices will be
unaffected.

Any VM running in HVM mode and passing through devices that fit inside
the default MMIO space will be unaffected.

Any VM running in HVM mode, and passing through devices that require
less than 2GiB of MMIO space, *and* having a low enough guest memory
that the MMIO hole can be enlarged without moving guest memory, will
be unaffected.  (For example, if you need 512MiB and you have <3584
MiB of guest RAM; or if you need 1024MiB and have <3072 MiB of guest
RAM.)

Any VM running in HVM mode, is passing through devices requiring less
than 2GiB of MMIO space, and is using qemu-traditional will be
unaffected.

For a VM running in HVM mode, passing through devices which require more
than 2GiB of MMIO space, and using qemu-traditional, and having more
than 2GiB of guest memory:  
 * We believe that at the moment what will happen is that because of a
bug in hvmloader (fixed in this series), no devices will be mapped in
64-bit space; instead, the smallest devices will simply not be mapped.
This will likely cause critical platform devices not to be mapped,
causing the VM not to be able to boot.
 * With this patch, the largest devices *will* be remapped into 64-bit
space.  
 * If we are right that the current code will fail, this is a uniform
improvement, even if the devices don''t work.  
 * If the current code would work, then a different set of devices
will be re-mapped to high memory.  This may change some configurations
from "works" into "doesn''t work".

I think this is a small enough contingent of users, that this is an
acceptable amount of risk to take.

We have now covered all configurations of qemu-traditional.  Since
xend only knows how to use qemu-traditional, this also covers all
configurations using xend.

For VMs running in HVM mode, using qemu-xen, but not using libxl, this
patch will have no effect: qemu-xen will crash.  NB that this cannot
include xend, as it only knows how to drive qemu-traditional.  This
can be worked around by using qemu-traditional instead, or by setting
the appropriate xenstore key on boot.  This is acceptable, because
this is not really a supported configuration; users should use one of
the supported toolstacks, or use libxl.

We have now covered all users of any non-libxl-based toolstack.

For VM running in HVM mode, using qemu-xen, using libxl, and passing
through devices such that the required 32-bit only MMIO space does not
fit in the default MMIO hole, and with enough memory that resizing the
MMIO hole requires moving guest RAM:
 * At the moment, hvmloader will relocate guest memory.  This will
cause qemu-xen to crash within a few minutes.
 * With this change, the devices with the smallest BARs will simply
not be mapped.  If these devices are non-critical, they will simply be
invisible to the OS; if these devices are critical, the OS will not
boot.

Crashing immediately or having non-visible devices are the same or
better than crashing a few minutes into boot, so this is an
improvement (or at least not a regression).

For a VM running in HVM mode, using qemu-xen, using libxl, having a
required 32-bit only MMIO space that does fit within the default MMIO
hole, but a total MMIO space that does not, and having enough memory
that resizing the MMIO hole requires moving guest RAM:
 * At the moment, hvmloader will relocate memory.  This will cause
qemu-xen to crash within a few minutes of booting.  Note that this is
true whether the total MMIO space is less than 2GiB or more.
 * With this change, devices with the largest BARs will be relocated
to 64-bit space.  We expect that in general, the devices thus
relocated will be the passed-through PCI devices.

We have decided already to consider any outcome of mapping a device
into a 64-bit address space to be no worse than, and potentially
better than, qemu-xen crashing; so this can be considered an improvement.

We have now covered all possible configurations.

In summary:
 * The vast majority of configurations are unaffected
 * For those that are affected, the vast majority are either a strict
improvement, or no worse than, the status quo.
 * There is a slight possibility that in one extreme corner case
(using qemu-traditional with >2GiB of MMIO space), we may possibly be
changing "works" into "fails".  I think this is an
acceptable risk.

Therefore, I think the risks posed by this change are acceptable.

CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>

George Dunlap

2013-Jun-21 10:46 UTC

head link

[PATCH v4 1/8] hvmloader: Remove all 64-bit print arguments

The printf() available to hvmloader does not handle 64-bit data types;
manually break them down as two 32-bit strings.

v4:
 - Make macros for the requisite format and bit shifting

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/pci.c  |   11 +++++++----
 tools/firmware/hvmloader/util.h |    2 ++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index c78d4d3..c1cb1e9 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -290,8 +290,9 @@ void pci_setup(void)
 
         if ( (base < resource->base) || (base > resource->max) )
         {
-            printf("pci dev %02x:%x bar %02x size %llx: no space for
"
-                   "resource!\n", devfn>>3, devfn&7,
bar_reg, bar_sz);
+            printf("pci dev %02x:%x bar %02x size "PRIllx": no
space for "
+                   "resource!\n", devfn>>3, devfn&7,
bar_reg,
+                   PRIllx_arg(bar_sz));
             continue;
         }
 
@@ -300,8 +301,10 @@ void pci_setup(void)
         pci_writel(devfn, bar_reg, bar_data);
         if (using_64bar)
             pci_writel(devfn, bar_reg + 4, bar_data_upper);
-        printf("pci dev %02x:%x bar %02x size %llx: %08x\n",
-               devfn>>3, devfn&7, bar_reg, bar_sz, bar_data);
+        printf("pci dev %02x:%x bar %02x size "PRIllx":
%08x\n",
+               devfn>>3, devfn&7, bar_reg,
+               PRIllx_arg(bar_sz),
+               bar_data);
 			
 
         /* Now enable the memory or I/O mapping. */
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 7913259..9ccb905 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -168,6 +168,8 @@ void byte_to_hex(char *digits, uint8_t byte);
 void uuid_to_string(char *dest, uint8_t *uuid);
 
 /* Debug output */
+#define PRIllx "%x%08x"
+#define PRIllx_arg(ll) (uint32_t)((ll)>>32), (uint32_t)(ll)
 int printf(const char *fmt, ...) __attribute__ ((format (printf, 1, 2)));
 int vprintf(const char *fmt, va_list ap);
 
-- 
1.7.9.5

George Dunlap

2013-Jun-21 10:46 UTC

head link

[PATCH v4 2/8] hvmloader: Make the printfs more informative

* Warn that you''re relocating some BARs to 64-bit

* Warn that you''re relocating guest pages, and how many

* Include upper 32-bits of the base register when printing the bar
  placement info

v4:
 - Move message about relocating guest pages into loop, include number
   of pages and guest paddr
 - Fixed minor brace style issue

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/pci.c |   13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index c1cb1e9..44168e2 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -214,7 +214,11 @@ void pci_setup(void)
         pci_mem_start <<= 1;
 
     if ( (pci_mem_start << 1) != 0 )
+    {
+        printf("Low MMIO hole not large enough for all devices,"
+               " relocating some BARs to 64-bit\n");
         bar64_relocate = 1;
+    }
 
     /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
     while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend
)
@@ -227,6 +231,11 @@ void pci_setup(void)
         if ( hvm_info->high_mem_pgend == 0 )
             hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
         hvm_info->low_mem_pgend -= nr_pages;
+        printf("Relocating 0x%x pages from "PRIllx" to
"PRIllx\
+               " for lowmem MMIO hole\n",
+               nr_pages,
+              
PRIllx_arg(((uint64_t)hvm_info->low_mem_pgend)<<PAGE_SHIFT),
+              
PRIllx_arg(((uint64_t)hvm_info->high_mem_pgend)<<PAGE_SHIFT));
         xatp.domid = DOMID_SELF;
         xatp.space = XENMAPSPACE_gmfn_range;
         xatp.idx   = hvm_info->low_mem_pgend;
@@ -301,10 +310,10 @@ void pci_setup(void)
         pci_writel(devfn, bar_reg, bar_data);
         if (using_64bar)
             pci_writel(devfn, bar_reg + 4, bar_data_upper);
-        printf("pci dev %02x:%x bar %02x size "PRIllx":
%08x\n",
+        printf("pci dev %02x:%x bar %02x size "PRIllx":
%x%08x\n",
                devfn>>3, devfn&7, bar_reg,
                PRIllx_arg(bar_sz),
-               bar_data);
+               bar_data_upper, bar_data);
 			
 
         /* Now enable the memory or I/O mapping. */
-- 
1.7.9.5

George Dunlap

2013-Jun-21 10:46 UTC

head link

[PATCH v4 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

hvmloader will read hvm_info->high_mem_pgend to calculate where to
start the highmem PCI region.  However, if the guest does not have any
memory in the high region, this is set to zero, which will cause
hvmloader to use the "0" for the base of the highmem region, rather
than 1 << 32.

Check to see whether hvm_info->high_mem_pgend is set; if so, do the
normal calculation; otherwise, use 1<<32.

v4:

 - Handle case where hfm_info->high_mem_pgend is non-zero but
doesn''t
   point into high memory, throwing a warning.


Signed-off-by: Geore Dunlap <george.dunlap@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/pci.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 44168e2..a3d03ed 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -246,7 +246,18 @@ void pci_setup(void)
         hvm_info->high_mem_pgend += nr_pages;
     }
 
-    high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend) <<
PAGE_SHIFT;
+    high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend) <<
PAGE_SHIFT;
+    if ( high_mem_resource.base < 1ull << 32 )
+    {
+        if ( hvm_info->high_mem_pgend != 0 )
+            printf("WARNING: hvm_info->high_mem_pgend %x"
+                   " does not point into high memory!",
+                   hvm_info->high_mem_pgend);
+        high_mem_resource.base = 1ull << 32;
+    }
+    printf("%sRAM in high memory; setting high_mem resource base to
"PRIllx"\n",
+           hvm_info->high_mem_pgend?"":"No ",
+           PRIllx_arg(high_mem_resource.base));
     high_mem_resource.max = 1ull << cpu_phys_addr();
     mem_resource.base = pci_mem_start;
     mem_resource.max = pci_mem_end;
-- 
1.7.9.5

George Dunlap

2013-Jun-21 10:46 UTC

head link

[PATCH v4 4/8] hvmloader: Fix check for needing a 64-bit bar

After attempting to resize the MMIO hole, the check to determine
whether there is a need to relocate BARs into 64-bit space checks the
specific thing that caused the loop to exit (MMIO hole == 2GiB) rather
than checking whether the required MMIO will fit in the hole.

But even then it does it wrong: the polarity of the check is
backwards.

Check for the actual condition we care about (the sizeof the MMIO
hole) rather than checking for the loop exit condition.

v3:
 - Move earlier in the series, before other functional changes

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/pci.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index a3d03ed..6792ed4 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -213,7 +213,7 @@ void pci_setup(void)
             ((pci_mem_start << 1) != 0) )
         pci_mem_start <<= 1;
 
-    if ( (pci_mem_start << 1) != 0 )
+    if ( mmio_total > (pci_mem_end - pci_mem_start) )
     {
         printf("Low MMIO hole not large enough for all devices,"
                " relocating some BARs to 64-bit\n");
-- 
1.7.9.5

George Dunlap

2013-Jun-21 10:46 UTC

head link

[PATCH v4 5/8] hvmloader: Correct bug in low mmio region accounting

When deciding whether to map a device in low MMIO space (<4GiB),
hvmloader compares it with "mmio_left", which is set to the size of
the low MMIO range (pci_mem_end - pci_mem_start).  However, even if it
does map a device in high MMIO space, it still removes the size of its
BAR from mmio_left.

In reality we don''t need to do a separate accounting of the low memory
available -- this can be calculated from mem_resource.  Just get rid
of the variable and the duplicate accounting entirely.  This will make
the code more robust.

v3:
 - Use mem_resource values directly instead of doing duplicate
   accounting

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/pci.c |    7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 6792ed4..80eef76 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -42,7 +42,6 @@ void pci_setup(void)
     uint32_t vga_devfn = 256;
     uint16_t class, vendor_id, device_id;
     unsigned int bar, pin, link, isa_irq;
-    int64_t mmio_left;
 
     /* Resources assignable to PCI devices via BARs. */
     struct resource {
@@ -264,8 +263,6 @@ void pci_setup(void)
     io_resource.base = 0xc000;
     io_resource.max = 0x10000;
 
-    mmio_left = pci_mem_end - pci_mem_start;
-
     /* Assign iomem and ioport resources in descending order of size. */
     for ( i = 0; i < nr_bars; i++ )
     {
@@ -273,7 +270,8 @@ void pci_setup(void)
         bar_reg = bars[i].bar_reg;
         bar_sz  = bars[i].bar_sz;
 
-        using_64bar = bars[i].is_64bar && bar64_relocate &&
(mmio_left < bar_sz);
+        using_64bar = bars[i].is_64bar && bar64_relocate
+            && (bar_sz > (mem_resource.max - mem_resource.base));
         bar_data = pci_readl(devfn, bar_reg);
 
         if ( (bar_data & PCI_BASE_ADDRESS_SPACE) =@@ -295,7 +293,6 @@ void
pci_setup(void)
                 resource = &mem_resource;
                 bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
             }
-            mmio_left -= bar_sz;
         }
         else
         {
-- 
1.7.9.5

George Dunlap

2013-Jun-21 10:46 UTC

head link

[PATCH v4 6/8] hvmloader: Load large devices into high MMIO space as needed

Keep track of how much mmio space is left total, as well as the amount
of "low" MMIO space (<4GiB), and only load devices into high memory
if
there is not enough low memory for the rest of the devices to fit.

Because devices are processed by size in order from large to small,
this should preferentially relocate devices with large BARs to 64-bit
space.

v3:
 - Just use mmio_total rather than introducing a new variable.
 - Port to using mem_resource directly rather than low_mmio_left

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/pci.c |    7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 80eef76..3f368f3 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -270,8 +270,12 @@ void pci_setup(void)
         bar_reg = bars[i].bar_reg;
         bar_sz  = bars[i].bar_sz;
 
+        /* Relocate to high memory if the total amount of MMIO needed
+         * is more than the low MMIO available.  Because devices are
+         * processed in order of bar_sz, this will preferentially
+         * relocate larger devices to high memory first. */
         using_64bar = bars[i].is_64bar && bar64_relocate
-            && (bar_sz > (mem_resource.max - mem_resource.base));
+            && (mmio_total > (mem_resource.max -
mem_resource.base));
         bar_data = pci_readl(devfn, bar_reg);
 
         if ( (bar_data & PCI_BASE_ADDRESS_SPACE) =@@ -293,6 +297,7 @@ void
pci_setup(void)
                 resource = &mem_resource;
                 bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
             }
+            mmio_total -= bar_sz;
         }
         else
         {
-- 
1.7.9.5

George Dunlap

2013-Jun-21 10:46 UTC

head link

[PATCH v4 7/8] hvmloader: Remove minimum size for BARs to relocate to 64-bit space

Allow devices with BARs less than 512MiB to be relocated to high
memory.

This will only be invoked if there is not enough low MMIO space to map
the device, and will be done preferentially to large devices first; so
in all likelihood only large devices will be remapped anyway.

This is needed to work-around the issue of qemu-xen not being able to
handle moving guest memory around to resize the MMIO hole.  The
default MMIO hole size is less than 256MiB.

v3:
 - Fixed minor style issue

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/config.h |    1 -
 tools/firmware/hvmloader/pci.c    |    5 ++---
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/tools/firmware/hvmloader/config.h
b/tools/firmware/hvmloader/config.h
index 8143d6f..6641197 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -55,7 +55,6 @@ extern struct bios_config ovmf_config;
 /* MMIO hole: Hardcoded defaults, which can be dynamically expanded. */
 #define PCI_MEM_START       0xf0000000
 #define PCI_MEM_END         0xfc000000
-#define PCI_MIN_BIG_BAR_SIZE          0x20000000
 
 extern unsigned long pci_mem_start, pci_mem_end;
 
diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 3f368f3..60e1a69 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -281,9 +281,8 @@ void pci_setup(void)
         if ( (bar_data & PCI_BASE_ADDRESS_SPACE) =             
PCI_BASE_ADDRESS_SPACE_MEMORY )
         {
-            /* Mapping high memory if PCI deivce is 64 bits bar and the bar
size
-               is larger than 512M */
-            if (using_64bar && (bar_sz > PCI_MIN_BIG_BAR_SIZE)) {
+            /* Mapping high memory if PCI device is 64 bits bar */
+            if ( using_64bar ) {
                 if ( high_mem_resource.base & (bar_sz - 1) )
                     high_mem_resource.base = high_mem_resource.base - 
                         (high_mem_resource.base & (bar_sz - 1)) + bar_sz;
-- 
1.7.9.5

George Dunlap

2013-Jun-21 10:46 UTC

head link

[PATCH v4 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

At the moment, qemu-xen can''t handle memory being relocated by
hvmloader.  This may happen if a device with a large enough memory
region is passed through to the guest.  At the moment, if this
happens, then at some point in the future qemu will crash and the
domain will hang.  (qemu-traditional is fine.)

It''s too late in the release to do a proper fix, so we try to do
damage control.

hvmloader already has mechanisms to relocate memory to 64-bit space if
it can''t make a big enough MMIO hole.  By default this is 2GiB; if we
just refuse to make the hole bigger if it will overlap with guest
memory, then the relocation will happen by default.

v4:
 - Wrap long line in libxl_dm.c
 - Fix comment
v3:
 - Fix polarity of comparison
 - Move diagnostic messages to another patch
 - Tested with xen platform pci device hacked to have different BAR sizes
   {256MiB, 1GiB} x {qemu-xen, qemu-traditional} x various memory
   configurations
 - Add comment explaining why we default to "allow"
 - Remove cast to bool
v2:
 - style fixes
 - fix and expand comment on the MMIO hole loop
 - use "%d" rather than "%s" ->
(...)?"1":"0"
 - use bool instead of uint8_t
 - Move 64-bit bar relocate detection to another patch
 - Add more diagnostic messages

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Stefano Stabellini <stefano.stabellini@citrix.com>
CC: Hanweidong <hanweidong@huawei.com>
CC: Keir Fraser <keir@xen.org>
CC: Keir Fraser <keir@xen.org>
---
 tools/firmware/hvmloader/pci.c          |   49 +++++++++++++++++++++++++++++--
 tools/libxl/libxl_dm.c                  |    8 +++++
 xen/include/public/hvm/hvm_xs_strings.h |    1 +
 3 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 60e1a69..48edd5e 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -27,6 +27,8 @@
 
 #include <xen/memory.h>
 #include <xen/hvm/ioreq.h>
+#include <xen/hvm/hvm_xs_strings.h>
+#include <stdbool.h>
 
 unsigned long pci_mem_start = PCI_MEM_START;
 unsigned long pci_mem_end = PCI_MEM_END;
@@ -57,6 +59,32 @@ void pci_setup(void)
     } *bars = (struct bars *)scratch_start;
     unsigned int i, nr_bars = 0;
 
+    const char *s;
+    /*
+     * Do we allow hvmloader to relocate guest memory in order to
+     * increase the size of the lowmem MMIO hole?  Defaulting to 1
+     * here will mean that non-libxl toolstacks (including xend and
+     * home-grown ones) will experience this series as "no change".
+     * It does mean that those using qemu-xen will still experience
+     * the bug (described below); but it also means that those using
+     * qemu-traditional will *not* experience any change; and it also
+     * means that there is a work-around for those using qemu-xen,
+     * namely switching to qemu-traditional.
+     *
+     * If we defaulted to 0, and failing to resize the hole caused any
+     * problems with qemu-traditional, then there is no work-around.
+     *
+     * Since xend can only use qemu-traditional, I think this is the
+     * option that will have the least impact.
+     */
+    bool allow_memory_relocate = 1;
+
+    s = xenstore_read(HVM_XS_ALLOW_MEMORY_RELOCATE, NULL);
+    if ( s )
+        allow_memory_relocate = strtoll(s, NULL, 0);
+    printf("Relocating guest memory for lowmem MMIO space %s\n",
+           allow_memory_relocate?"enabled":"disabled");
+
     /* Program PCI-ISA bridge with appropriate link routes. */
     isa_irq = 0;
     for ( link = 0; link < 4; link++ )
@@ -208,8 +236,25 @@ void pci_setup(void)
         pci_writew(devfn, PCI_COMMAND, cmd);
     }
 
-    while ( (mmio_total > (pci_mem_end - pci_mem_start)) &&
-            ((pci_mem_start << 1) != 0) )
+    /*
+     * At the moment qemu-xen can''t deal with relocated memory
regions.
+     * It''s too close to the release to make a proper fix; for now,
+     * only allow the MMIO hole to grow large enough to move guest memory
+     * if we''re running qemu-traditional.  Items that don''t
fit will be
+     * relocated into the 64-bit address space.
+     *
+     * This loop now does the following:
+     * - If allow_memory_relocate, increase the MMIO hole until it''s
+     *   big enough, or until it''s 2GiB
+     * - If !allow_memory_relocate, increase the MMIO hole until it''s
+     *   big enough, or until it''s 2GiB, or until it overlaps guest
+     *   memory
+     */
+    while ( (mmio_total > (pci_mem_end - pci_mem_start)) 
+            && ((pci_mem_start << 1) != 0)
+            && (allow_memory_relocate
+                || (((pci_mem_start << 1) >> PAGE_SHIFT)
+                    >= hvm_info->low_mem_pgend)) )
         pci_mem_start <<= 1;
 
     if ( mmio_total > (pci_mem_end - pci_mem_start) )
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index ac1f90e..7e54c02 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1154,6 +1154,14 @@ void libxl__spawn_local_dm(libxl__egc *egc,
libxl__dm_spawn_state *dmss)
         libxl__xs_write(gc, XBT_NULL,
                         libxl__sprintf(gc, "%s/hvmloader/bios",
path),
                         "%s",
libxl_bios_type_to_string(b_info->u.hvm.bios));
+        /* Disable relocating memory to make the MMIO hole larger
+         * unless we''re running qemu-traditional */
+        libxl__xs_write(gc, XBT_NULL,
+                        libxl__sprintf(gc,
+                                      
"%s/hvmloader/allow-memory-relocate",
+                                       path),
+                        "%d",
+                       
b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL);
         free(path);
     }
 
diff --git a/xen/include/public/hvm/hvm_xs_strings.h
b/xen/include/public/hvm/hvm_xs_strings.h
index 9042303..4de5881 100644
--- a/xen/include/public/hvm/hvm_xs_strings.h
+++ b/xen/include/public/hvm/hvm_xs_strings.h
@@ -28,6 +28,7 @@
 #define HVM_XS_HVMLOADER               "hvmloader"
 #define HVM_XS_BIOS                    "hvmloader/bios"
 #define HVM_XS_GENERATION_ID_ADDRESS  
"hvmloader/generation-id-address"
+#define HVM_XS_ALLOW_MEMORY_RELOCATE  
"hvmloader/allow-memory-relocate"
 
 /* The following values allow additional ACPI tables to be added to the
  * virtual ACPI BIOS that hvmloader constructs. The values specify the guest
-- 
1.7.9.5

Ian Jackson

2013-Jun-21 10:48 UTC

head link

Re: [PATCH v4 1/8] hvmloader: Remove all 64-bit print arguments

George Dunlap writes ("[PATCH v4 1/8] hvmloader: Remove all 64-bit print
arguments"):> The printf() available to hvmloader does not handle 64-bit data types;
> manually break them down as two 32-bit strings.
> 
> v4:
>  - Make macros for the requisite format and bit shifting
This is an improvement.

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian Jackson

2013-Jun-21 10:49 UTC

head link

Re: [PATCH v4 2/8] hvmloader: Make the printfs more informative

George Dunlap writes ("[PATCH v4 2/8] hvmloader: Make the printfs more
informative"):> * Warn that you''re relocating some BARs to 64-bit
> 
> * Warn that you''re relocating guest pages, and how many
> 
> * Include upper 32-bits of the base register when printing the bar
>   placement info
> 
> v4:
>  - Move message about relocating guest pages into loop, include number
>    of pages and guest paddr
>  - Fixed minor brace style issue
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian Jackson

2013-Jun-21 10:50 UTC

head link

Re: [PATCH v4 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

George Dunlap writes ("[PATCH v4 3/8] hvmloader: Set up highmem resouce
appropriately if there is no RAM above 4G"):> hvmloader will read hvm_info->high_mem_pgend to calculate where to
> start the highmem PCI region.  However, if the guest does not have any
> memory in the high region, this is set to zero, which will cause
> hvmloader to use the "0" for the base of the highmem region,
rather
> than 1 << 32.
> 
> Check to see whether hvm_info->high_mem_pgend is set; if so, do the
> normal calculation; otherwise, use 1<<32.
> 
> v4:
> 
>  - Handle case where hfm_info->high_mem_pgend is non-zero but
doesn''t
>    point into high memory, throwing a warning.
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian Jackson

2013-Jun-21 10:51 UTC

head link

Re: [PATCH v4 4/8] hvmloader: Fix check for needing a 64-bit bar

George Dunlap writes ("[PATCH v4 4/8] hvmloader: Fix check for needing a
64-bit bar"):> After attempting to resize the MMIO hole, the check to determine
> whether there is a need to relocate BARs into 64-bit space checks the
> specific thing that caused the loop to exit (MMIO hole == 2GiB) rather
> than checking whether the required MMIO will fit in the hole.
> 
> But even then it does it wrong: the polarity of the check is
> backwards.
> 
> Check for the actual condition we care about (the sizeof the MMIO
> hole) rather than checking for the loop exit condition.
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Stefano Stabellini

2013-Jun-21 10:53 UTC

head link

Re: [PATCH v3 2/8] hvmloader: Make the printfs more informative

On Fri, 21 Jun 2013, George Dunlap wrote:> On 20/06/13 18:12, Stefano Stabellini wrote:
> > On Thu, 20 Jun 2013, George Dunlap wrote:
> > > * Warn that you''re relocating some BARs to 64-bit
> > > 
> > > * Warn that you''re relocating guest pages, and how many
> > > 
> > > * Include upper 32-bits of the base register when printing the
bar
> > >    placement info
> > > 
> > > Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> > > CC: Ian Campbell <ian.campbell@citrix.com>
> > > CC: Ian Jackson <ian.jackson@citrix.com>
> > > CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> > > CC: Hanweidong <hanweidong@huawei.com>
> > > CC: Keir Fraser <keir@xen.org>
> > > ---
> > >   tools/firmware/hvmloader/pci.c |   13 ++++++++++---
> > >   1 file changed, 10 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/tools/firmware/hvmloader/pci.c
> > > b/tools/firmware/hvmloader/pci.c
> > > index aa54bc1..d8592b0 100644
> > > --- a/tools/firmware/hvmloader/pci.c
> > > +++ b/tools/firmware/hvmloader/pci.c
> > > @@ -213,10 +213,17 @@ void pci_setup(void)
> > >               ((pci_mem_start << 1) != 0) )
> > >           pci_mem_start <<= 1;
> > >   -    if ( (pci_mem_start << 1) != 0 )
> > > +    if ( (pci_mem_start << 1) != 0 ) {
> > > +        printf("Low MMIO hole not large enough for all
devices,"
> > > +               " relocating some BARs to 64-bit\n");
> > >           bar64_relocate = 1;
> > > +    }
> > >         /* Relocate RAM that overlaps PCI space (in 64k-page
chunks). */
> > > +    if ( (pci_mem_start >> PAGE_SHIFT) <
hvm_info->low_mem_pgend )
> > > +        printf("Relocating 0x%lx pages to highmem for
lowmem MMIO
> > > hole\n",
> > > +               hvm_info->low_mem_pgend - (pci_mem_start
>> PAGE_SHIFT));
> > Shouldn''t this be:
> > 
> > min_t(unsigned int,
> >        hvm_info->low_mem_pgend - (pci_mem_start >>
PAGE_SHIFT),
> >        (1u << 16) - 1);
> > 
> > to match exactly what we do in the relocation code?
> 
> No; the relocation is done in a loop which will run until the condition in
the
> if above is satisfied.
> 
> We could, I suppose, do the printf on each iteration of the loop; if
I''m doing
> the math right*, the maximum iterations around the loop should be 8, and a
> typical number would be just 1 or 2.
> 
> * Maximum MMIO size: 2GiB == 1<<31.  In 4-k pages, that''s
1<<(31-12) == 1<<19.
> This will do a batch of 1<<16 pages at a time, leaving 1<<3
iterations
> maximum, or 8.  (1<<16 pages is 1<<(16+12) or 1<<28
bytes, or 1<<8 == 256
> megabytes moved at a time.)
I am not saying that we should add a printf in each iteration of the
loop. I am only saying that if we want to be correct, the number of
pages that we relocate to highmem is not:

hvm_info->low_mem_pgend - (pci_mem_start >> PAGE_SHIFT)

but it''s

min_t(unsigned int,
      hvm_info->low_mem_pgend - (pci_mem_start >> PAGE_SHIFT),
      (1u << 16) - 1)

for each iteration of the loop. It''s not the same.

Stefano Stabellini

2013-Jun-21 10:53 UTC

head link

Re: [PATCH v3 2/8] hvmloader: Make the printfs more informative

On Fri, 21 Jun 2013, George Dunlap wrote:> On 21/06/13 10:35, George Dunlap wrote:
> > On 20/06/13 18:12, Stefano Stabellini wrote:
> > > On Thu, 20 Jun 2013, George Dunlap wrote:
> > > > * Warn that you''re relocating some BARs to 64-bit
> > > > 
> > > > * Warn that you''re relocating guest pages, and how
many
> > > > 
> > > > * Include upper 32-bits of the base register when printing
the bar
> > > >    placement info
> > > > 
> > > > Signed-off-by: George Dunlap
<george.dunlap@eu.citrix.com>
> > > > CC: Ian Campbell <ian.campbell@citrix.com>
> > > > CC: Ian Jackson <ian.jackson@citrix.com>
> > > > CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> > > > CC: Hanweidong <hanweidong@huawei.com>
> > > > CC: Keir Fraser <keir@xen.org>
> > > > ---
> > > >   tools/firmware/hvmloader/pci.c |   13 ++++++++++---
> > > >   1 file changed, 10 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/tools/firmware/hvmloader/pci.c
> > > > b/tools/firmware/hvmloader/pci.c
> > > > index aa54bc1..d8592b0 100644
> > > > --- a/tools/firmware/hvmloader/pci.c
> > > > +++ b/tools/firmware/hvmloader/pci.c
> > > > @@ -213,10 +213,17 @@ void pci_setup(void)
> > > >               ((pci_mem_start << 1) != 0) )
> > > >           pci_mem_start <<= 1;
> > > >   -    if ( (pci_mem_start << 1) != 0 )
> > > > +    if ( (pci_mem_start << 1) != 0 ) {
> > > > +        printf("Low MMIO hole not large enough for all
devices,"
> > > > +               " relocating some BARs to
64-bit\n");
> > > >           bar64_relocate = 1;
> > > > +    }
> > > >         /* Relocate RAM that overlaps PCI space (in 64k-page
chunks). */
> > > > +    if ( (pci_mem_start >> PAGE_SHIFT) <
hvm_info->low_mem_pgend )
> > > > +        printf("Relocating 0x%lx pages to highmem for
lowmem MMIO
> > > > hole\n",
> > > > +               hvm_info->low_mem_pgend - (pci_mem_start
>>
> > > > PAGE_SHIFT));
> > > Shouldn''t this be:
> > > 
> > > min_t(unsigned int,
> > >        hvm_info->low_mem_pgend - (pci_mem_start >>
PAGE_SHIFT),
> > >        (1u << 16) - 1);
> > > 
> > > to match exactly what we do in the relocation code?
> > 
> > No; the relocation is done in a loop which will run until the
condition in
> > the if above is satisfied.
> > 
> > We could, I suppose, do the printf on each iteration of the loop; if
I''m
> > doing the math right*, the maximum iterations around the loop should
be 8,
> > and a typical number would be just 1 or 2.
> > 
> > * Maximum MMIO size: 2GiB == 1<<31.  In 4-k pages,
that''s 1<<(31-12) => > 1<<19.  This will do a batch
of 1<<16 pages at a time, leaving 1<<3
> > iterations maximum, or 8.  (1<<16 pages is 1<<(16+12) or
1<<28 bytes, or
> > 1<<8 == 256 megabytes moved at a time.)
> > 
> > > 
> > > Regarding the message, what about:
> > > 
> > > printf("Relocating 0x%lx pages from 0x%lx to 0x%lx%lx to
make room for a
> > > larger MMIO hole\n",
> > >         min_t(unsigned int,
> > >               hvm_info->low_mem_pgend - (pci_mem_start
>> PAGE_SHIFT),
> > >               (1u << 16) - 1),
> > >          hvm_info->low_mem_pgend,
> > >          hvm_info->high_mem_pgend);
> > 
> > Lemme see how it looks...
> 
> If you have a 2GiB MMIO hole, it looks like this:
> 
> (XEN) HVM42: Relocating 0xffff pages from 0e0001000 to 20f800000 for lowmem
> MMIO hole
> (XEN) HVM42: Relocating 0xffff pages from 0d0002000 to 21f7ff000 for lowmem
> MMIO hole
> (XEN) HVM42: Relocating 0xffff pages from 0c0003000 to 22f7fe000 for lowmem
> MMIO hole
> (XEN) HVM42: Relocating 0xffff pages from 0b0004000 to 23f7fd000 for lowmem
> MMIO hole
> (XEN) HVM42: Relocating 0xffff pages from 0a0005000 to 24f7fc000 for lowmem
> MMIO hole
> (XEN) HVM42: Relocating 0xffff pages from 090006000 to 25f7fb000 for lowmem
> MMIO hole
> (XEN) HVM42: Relocating 0xffff pages from 080007000 to 26f7fa000 for lowmem
> MMIO hole
> (XEN) HVM42: Relocating 0x7 pages from 080000000 to 27f7f9000 for lowmem
MMIO
> hole
> 
> 
> Kind of ugly; I think I liked it better with just one printf.
I agree

Stefano Stabellini

2013-Jun-21 10:55 UTC

head link

Re: [PATCH v4 1/8] hvmloader: Remove all 64-bit print arguments

On Fri, 21 Jun 2013, George Dunlap wrote:> The printf() available to hvmloader does not handle 64-bit data types;
> manually break them down as two 32-bit strings.
> 
> v4:
>  - Make macros for the requisite format and bit shifting
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Hanweidong <hanweidong@huawei.com>
> CC: Keir Fraser <keir@xen.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

>  tools/firmware/hvmloader/pci.c  |   11 +++++++----
>  tools/firmware/hvmloader/util.h |    2 ++
>  2 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c
b/tools/firmware/hvmloader/pci.c
> index c78d4d3..c1cb1e9 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -290,8 +290,9 @@ void pci_setup(void)
>  
>          if ( (base < resource->base) || (base > resource->max)
)
>          {
> -            printf("pci dev %02x:%x bar %02x size %llx: no space for
"
> -                   "resource!\n", devfn>>3, devfn&7,
bar_reg, bar_sz);
> +            printf("pci dev %02x:%x bar %02x size "PRIllx":
no space for "
> +                   "resource!\n", devfn>>3, devfn&7,
bar_reg,
> +                   PRIllx_arg(bar_sz));
>              continue;
>          }
>  
> @@ -300,8 +301,10 @@ void pci_setup(void)
>          pci_writel(devfn, bar_reg, bar_data);
>          if (using_64bar)
>              pci_writel(devfn, bar_reg + 4, bar_data_upper);
> -        printf("pci dev %02x:%x bar %02x size %llx: %08x\n",
> -               devfn>>3, devfn&7, bar_reg, bar_sz, bar_data);
> +        printf("pci dev %02x:%x bar %02x size "PRIllx":
%08x\n",
> +               devfn>>3, devfn&7, bar_reg,
> +               PRIllx_arg(bar_sz),
> +               bar_data);
>  			
>  
>          /* Now enable the memory or I/O mapping. */
> diff --git a/tools/firmware/hvmloader/util.h
b/tools/firmware/hvmloader/util.h
> index 7913259..9ccb905 100644
> --- a/tools/firmware/hvmloader/util.h
> +++ b/tools/firmware/hvmloader/util.h
> @@ -168,6 +168,8 @@ void byte_to_hex(char *digits, uint8_t byte);
>  void uuid_to_string(char *dest, uint8_t *uuid);
>  
>  /* Debug output */
> +#define PRIllx "%x%08x"
> +#define PRIllx_arg(ll) (uint32_t)((ll)>>32), (uint32_t)(ll)
>  int printf(const char *fmt, ...) __attribute__ ((format (printf, 1, 2)));
>  int vprintf(const char *fmt, va_list ap);
>  
> -- 
> 1.7.9.5
>

Stefano Stabellini

2013-Jun-21 10:57 UTC

head link

Re: [PATCH v4 2/8] hvmloader: Make the printfs more informative

On Fri, 21 Jun 2013, George Dunlap wrote:> * Warn that you''re relocating some BARs to 64-bit
> 
> * Warn that you''re relocating guest pages, and how many
> 
> * Include upper 32-bits of the base register when printing the bar
>   placement info
> 
> v4:
>  - Move message about relocating guest pages into loop, include number
>    of pages and guest paddr
>  - Fixed minor brace style issue
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Hanweidong <hanweidong@huawei.com>
> CC: Keir Fraser <keir@xen.org>
> ---
>  tools/firmware/hvmloader/pci.c |   13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c
b/tools/firmware/hvmloader/pci.c
> index c1cb1e9..44168e2 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -214,7 +214,11 @@ void pci_setup(void)
>          pci_mem_start <<= 1;
>  
>      if ( (pci_mem_start << 1) != 0 )
> +    {
> +        printf("Low MMIO hole not large enough for all devices,"
> +               " relocating some BARs to 64-bit\n");
>          bar64_relocate = 1;
> +    }
>  
>      /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
>      while ( (pci_mem_start >> PAGE_SHIFT) <
hvm_info->low_mem_pgend )
> @@ -227,6 +231,11 @@ void pci_setup(void)
>          if ( hvm_info->high_mem_pgend == 0 )
>              hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
>          hvm_info->low_mem_pgend -= nr_pages;
> +        printf("Relocating 0x%x pages from "PRIllx" to
"PRIllx\
> +               " for lowmem MMIO hole\n",
> +               nr_pages,
> +              
PRIllx_arg(((uint64_t)hvm_info->low_mem_pgend)<<PAGE_SHIFT),
> +              
PRIllx_arg(((uint64_t)hvm_info->high_mem_pgend)<<PAGE_SHIFT));
>          xatp.domid = DOMID_SELF;
>          xatp.space = XENMAPSPACE_gmfn_range;
>          xatp.idx   = hvm_info->low_mem_pgend;
This wasn''t exactly what I suggested but it''s correct.

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>


> @@ -301,10 +310,10 @@ void pci_setup(void)
>          pci_writel(devfn, bar_reg, bar_data);
>          if (using_64bar)
>              pci_writel(devfn, bar_reg + 4, bar_data_upper);
> -        printf("pci dev %02x:%x bar %02x size "PRIllx":
%08x\n",
> +        printf("pci dev %02x:%x bar %02x size "PRIllx":
%x%08x\n",
>                 devfn>>3, devfn&7, bar_reg,
>                 PRIllx_arg(bar_sz),
> -               bar_data);
> +               bar_data_upper, bar_data);
>  			
>  
>          /* Now enable the memory or I/O mapping. */
> -- 
> 1.7.9.5
>

Stefano Stabellini

2013-Jun-21 11:11 UTC

head link

Re: [PATCH v4 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

On Fri, 21 Jun 2013, George Dunlap wrote:> hvmloader will read hvm_info->high_mem_pgend to calculate where to
> start the highmem PCI region.  However, if the guest does not have any
> memory in the high region, this is set to zero, which will cause
> hvmloader to use the "0" for the base of the highmem region,
rather
> than 1 << 32.
> 
> Check to see whether hvm_info->high_mem_pgend is set; if so, do the
> normal calculation; otherwise, use 1<<32.
> 
> v4:
> 
>  - Handle case where hfm_info->high_mem_pgend is non-zero but
doesn''t
>    point into high memory, throwing a warning.
> 
> 
> Signed-off-by: Geore Dunlap <george.dunlap@eu.citrix.com>                    ^ ?
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Hanweidong <hanweidong@huawei.com>
> CC: Keir Fraser <keir@xen.org>

Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

>  tools/firmware/hvmloader/pci.c |   13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c
b/tools/firmware/hvmloader/pci.c
> index 44168e2..a3d03ed 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -246,7 +246,18 @@ void pci_setup(void)
>          hvm_info->high_mem_pgend += nr_pages;
>      }
>  
> -    high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend)
<< PAGE_SHIFT;
> +    high_mem_resource.base = ((uint64_t)hvm_info->high_mem_pgend)
<< PAGE_SHIFT;
> +    if ( high_mem_resource.base < 1ull << 32 )
> +    {
> +        if ( hvm_info->high_mem_pgend != 0 )
> +            printf("WARNING: hvm_info->high_mem_pgend %x"
> +                   " does not point into high memory!",
> +                   hvm_info->high_mem_pgend);
> +        high_mem_resource.base = 1ull << 32;
> +    }
> +    printf("%sRAM in high memory; setting high_mem resource base to
"PRIllx"\n",
> +           hvm_info->high_mem_pgend?"":"No ",
> +           PRIllx_arg(high_mem_resource.base));
>      high_mem_resource.max = 1ull << cpu_phys_addr();
>      mem_resource.base = pci_mem_start;
>      mem_resource.max = pci_mem_end;
> -- 
> 1.7.9.5
>

Stefano Stabellini

2013-Jun-21 11:15 UTC

head link

Re: [PATCH v4 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

On Fri, 21 Jun 2013, George Dunlap wrote:> At the moment, qemu-xen can''t handle memory being relocated by
> hvmloader.  This may happen if a device with a large enough memory
> region is passed through to the guest.  At the moment, if this
> happens, then at some point in the future qemu will crash and the
> domain will hang.  (qemu-traditional is fine.)
> 
> It''s too late in the release to do a proper fix, so we try to do
> damage control.
> 
> hvmloader already has mechanisms to relocate memory to 64-bit space if
> it can''t make a big enough MMIO hole.  By default this is 2GiB; if
we
> just refuse to make the hole bigger if it will overlap with guest
> memory, then the relocation will happen by default.
> 
> v4:
>  - Wrap long line in libxl_dm.c
>  - Fix comment
> v3:
>  - Fix polarity of comparison
>  - Move diagnostic messages to another patch
>  - Tested with xen platform pci device hacked to have different BAR sizes
>    {256MiB, 1GiB} x {qemu-xen, qemu-traditional} x various memory
>    configurations
>  - Add comment explaining why we default to "allow"
>  - Remove cast to bool
> v2:
>  - style fixes
>  - fix and expand comment on the MMIO hole loop
>  - use "%d" rather than "%s" ->
(...)?"1":"0"
>  - use bool instead of uint8_t
>  - Move 64-bit bar relocate detection to another patch
>  - Add more diagnostic messages
> 
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> CC: Ian Campbell <ian.campbell@citrix.com>
> CC: Ian Jackson <ian.jackson@citrix.com>
> CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> CC: Hanweidong <hanweidong@huawei.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Keir Fraser <keir@xen.org>
> ---
>  tools/firmware/hvmloader/pci.c          |   49
+++++++++++++++++++++++++++++--
>  tools/libxl/libxl_dm.c                  |    8 +++++
>  xen/include/public/hvm/hvm_xs_strings.h |    1 +
>  3 files changed, 56 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/firmware/hvmloader/pci.c
b/tools/firmware/hvmloader/pci.c
> index 60e1a69..48edd5e 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -27,6 +27,8 @@
>  
>  #include <xen/memory.h>
>  #include <xen/hvm/ioreq.h>
> +#include <xen/hvm/hvm_xs_strings.h>
> +#include <stdbool.h>
>  
>  unsigned long pci_mem_start = PCI_MEM_START;
>  unsigned long pci_mem_end = PCI_MEM_END;
> @@ -57,6 +59,32 @@ void pci_setup(void)
>      } *bars = (struct bars *)scratch_start;
>      unsigned int i, nr_bars = 0;
>  
> +    const char *s;
> +    /*
> +     * Do we allow hvmloader to relocate guest memory in order to
> +     * increase the size of the lowmem MMIO hole?  Defaulting to 1
> +     * here will mean that non-libxl toolstacks (including xend and
> +     * home-grown ones) will experience this series as "no
change".
Sorry for being anal, but "this series" is also meaningless in a
comment
within the code. What series? Maybe you could use "this commit"
instead.

> +     * It does mean that those using qemu-xen will still experience
> +     * the bug (described below); but it also means that those using
> +     * qemu-traditional will *not* experience any change; and it also
> +     * means that there is a work-around for those using qemu-xen,
> +     * namely switching to qemu-traditional.
> +     *
> +     * If we defaulted to 0, and failing to resize the hole caused any
> +     * problems with qemu-traditional, then there is no work-around.
> +     *
> +     * Since xend can only use qemu-traditional, I think this is the
> +     * option that will have the least impact.
> +     */
> +    bool allow_memory_relocate = 1;
> +
> +    s = xenstore_read(HVM_XS_ALLOW_MEMORY_RELOCATE, NULL);
> +    if ( s )
> +        allow_memory_relocate = strtoll(s, NULL, 0);
> +    printf("Relocating guest memory for lowmem MMIO space %s\n",
> +          
allow_memory_relocate?"enabled":"disabled");
> +
>      /* Program PCI-ISA bridge with appropriate link routes. */
>      isa_irq = 0;
>      for ( link = 0; link < 4; link++ )
> @@ -208,8 +236,25 @@ void pci_setup(void)
>          pci_writew(devfn, PCI_COMMAND, cmd);
>      }
>  
> -    while ( (mmio_total > (pci_mem_end - pci_mem_start)) &&
> -            ((pci_mem_start << 1) != 0) )
> +    /*
> +     * At the moment qemu-xen can''t deal with relocated memory
regions.
> +     * It''s too close to the release to make a proper fix; for
now,
> +     * only allow the MMIO hole to grow large enough to move guest memory
> +     * if we''re running qemu-traditional.  Items that
don''t fit will be
> +     * relocated into the 64-bit address space.
> +     *
> +     * This loop now does the following:
> +     * - If allow_memory_relocate, increase the MMIO hole until
it''s
> +     *   big enough, or until it''s 2GiB
> +     * - If !allow_memory_relocate, increase the MMIO hole until
it''s
> +     *   big enough, or until it''s 2GiB, or until it overlaps
guest
> +     *   memory
> +     */
> +    while ( (mmio_total > (pci_mem_end - pci_mem_start)) 
> +            && ((pci_mem_start << 1) != 0)
> +            && (allow_memory_relocate
> +                || (((pci_mem_start << 1) >> PAGE_SHIFT)
> +                    >= hvm_info->low_mem_pgend)) )
>          pci_mem_start <<= 1;
>  
>      if ( mmio_total > (pci_mem_end - pci_mem_start) )
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index ac1f90e..7e54c02 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -1154,6 +1154,14 @@ void libxl__spawn_local_dm(libxl__egc *egc,
libxl__dm_spawn_state *dmss)
>          libxl__xs_write(gc, XBT_NULL,
>                          libxl__sprintf(gc, "%s/hvmloader/bios",
path),
>                          "%s",
libxl_bios_type_to_string(b_info->u.hvm.bios));
> +        /* Disable relocating memory to make the MMIO hole larger
> +         * unless we''re running qemu-traditional */
> +        libxl__xs_write(gc, XBT_NULL,
> +                        libxl__sprintf(gc,
> +                                      
"%s/hvmloader/allow-memory-relocate",
> +                                       path),
> +                        "%d",
> +                       
b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL);
>          free(path);
>      }
>  
> diff --git a/xen/include/public/hvm/hvm_xs_strings.h
b/xen/include/public/hvm/hvm_xs_strings.h
> index 9042303..4de5881 100644
> --- a/xen/include/public/hvm/hvm_xs_strings.h
> +++ b/xen/include/public/hvm/hvm_xs_strings.h
> @@ -28,6 +28,7 @@
>  #define HVM_XS_HVMLOADER               "hvmloader"
>  #define HVM_XS_BIOS                    "hvmloader/bios"
>  #define HVM_XS_GENERATION_ID_ADDRESS  
"hvmloader/generation-id-address"
> +#define HVM_XS_ALLOW_MEMORY_RELOCATE  
"hvmloader/allow-memory-relocate"
>  
>  /* The following values allow additional ACPI tables to be added to the
>   * virtual ACPI BIOS that hvmloader constructs. The values specify the
guest
> -- 
> 1.7.9.5
>

Ian Jackson

2013-Jun-21 11:19 UTC

head link

Re: [PATCH v4 5/8] hvmloader: Correct bug in low mmio region accounting

George Dunlap writes ("[PATCH v4 5/8] hvmloader: Correct bug in low mmio
region accounting"):> When deciding whether to map a device in low MMIO space (<4GiB),
> hvmloader compares it with "mmio_left", which is set to the size
of
> the low MMIO range (pci_mem_end - pci_mem_start).  However, even if it
> does map a device in high MMIO space, it still removes the size of its
> BAR from mmio_left.
> 
> In reality we don''t need to do a separate accounting of the low
memory
> available -- this can be calculated from mem_resource.  Just get rid
> of the variable and the duplicate accounting entirely.  This will make
> the code more robust.
...> -        using_64bar = bars[i].is_64bar && bar64_relocate
&& (mmio_left < bar_sz);
> +        using_64bar = bars[i].is_64bar && bar64_relocate
> +            && (bar_sz > (mem_resource.max -
mem_resource.base));
This is not entirely straightforward I think.

The actual calculation about whether it will actually fit, rather than
a precalculation of whether it is going to fit, is done here:

        base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
        bar_data |= (uint32_t)base;
        bar_data_upper = (uint32_t)(base >> 32);
        base += bar_sz;

        if ( (base < resource->base) || (base > resource->max) )
            [ ... doesn''t fit ... ]

The first test rounds the base up to a multiple of bar_sz.  I assume
that this is a requirement of the PCI spec.

(While I''m here I''ll note that the (uint64_t) cast in that
line is
unneccessary and confusing.  If bar_sz weren''t 64-bit this code would
be quite wrong, and putting that cast there suggests that it might not
be.)

In infer (from "bar_sz &= ~(bar_sz - 1)") that bar_sz is supposed
to
be always a power of two.  And we have devices in descending order of
size.  So at least after the first device, this rounding does nothing.

But for the first device I think it may be possible for resource->base
not to be a multiple of the bar_sz, and in that case it might be that
the precalculation thinks it will fit when the actual placement
calculation doesn''t.

Do you think this is possible ?

This is certainly excessively confusing.  From a lack-of-regressions
point of view we are going to have to analyse it properly regardless
of whether we restructure it or not.

I would be tempted to suggest lifting the "base" etc. calculation into
a macro or function so that we can directly say

  +        using_64bar = bars[i].is_64bar && bar64_relocate
  +            && !try_allocate_resource(&mem_resource, &allocd,
&new_base)

and later

  -       base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz -
1);
  -       bar_data |= (uint32_t)base;
  -       bar_data_upper = (uint32_t)(base >> 32);
  -       base += bar_sz;
  -
  +       if ( !try_allocate_resource(resource, &allocd, &new_base) )
          {
              printf("pci dev %02x:%x bar %02x size "PRIllx": no
space for "
                     "resource!\n", devfn>>3, devfn&7,
bar_reg,
                     PRIllx_arg(bar_sz));
              continue;
          }

  -       resource->base = base;
  +       resource->base = new_base;
  +       bar_data |= (uint32_t)allocd;
  +       bar_data_upper = (uint32_t)(allocd >> 32);

or something.

Thanks,
Ian.

Keir Fraser

2013-Jun-21 11:20 UTC

head link

Re: [PATCH v4 1/8] hvmloader: Remove all 64-bit print arguments

On 21/06/2013 11:48, "Ian Jackson" <Ian.Jackson@eu.citrix.com>
wrote:
> George Dunlap writes ("[PATCH v4 1/8] hvmloader: Remove all 64-bit
print
> arguments"):
>> The printf() available to hvmloader does not handle 64-bit data types;
>> manually break them down as two 32-bit strings.
>> 
>> v4:
>>  - Make macros for the requisite format and bit shifting
> 
> This is an improvement.
> 
> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

I like it because it makes it easy to go implement %llx properly later. I
think it''s dumb we''re not doing that now to be honest, we have
code in Xen
that could be pinched.

 -- Keir

Stefano Stabellini

2013-Jun-21 11:21 UTC

head link

Re: [PATCH v3 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

On Fri, 21 Jun 2013, George Dunlap wrote:> On 20/06/13 18:38, Stefano Stabellini wrote:
> > On Thu, 20 Jun 2013, George Dunlap wrote:
> > > At the moment, qemu-xen can''t handle memory being
relocated by
> > > hvmloader.  This may happen if a device with a large enough
memory
> > > region is passed through to the guest.  At the moment, if this
> > > happens, then at some point in the future qemu will crash and the
> > > domain will hang.  (qemu-traditional is fine.)
> > > 
> > > It''s too late in the release to do a proper fix, so we
try to do
> > > damage control.
> > > 
> > > hvmloader already has mechanisms to relocate memory to 64-bit
space
> > > if it can''t make a big enough MMIO hole.  By default
this is 2GiB; if
> > > we just refuse to make the hole bigger if it will overlap with
guest
> > > memory, then the relocation will happen by default.
> > > 
> > > v3:
> > >   - Fix polarity of comparison
> > >   - Move diagnostic messages to another patch
> > >   - Tested with xen platform pci device hacked to have different
BAR sizes
> > >     {256MiB, 1GiB} x {qemu-xen, qemu-traditional} x various
memory
> > >     configurations
> > >   - Add comment explaining why we default to "allow"
> > >   - Remove cast to bool
> > > v2:
> > >   - style fixes
> > >   - fix and expand comment on the MMIO hole loop
> > >   - use "%d" rather than "%s" ->
(...)?"1":"0"
> > >   - use bool instead of uint8_t
> > >   - Move 64-bit bar relocate detection to another patch
> > >   - Add more diagnostic messages
> > > 
> > > Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
> > > CC: Ian Campbell <ian.campbell@citrix.com>
> > > CC: Ian Jackson <ian.jackson@citrix.com>
> > > CC: Stefano Stabellini <stefano.stabellini@citrix.com>
> > > CC: Hanweidong <hanweidong@huawei.com>
> > > CC: Keir Fraser <keir@xen.org>
> > > CC: Keir Fraser <keir@xen.org>
> > > ---
> > >   tools/firmware/hvmloader/pci.c          |   49
> > > +++++++++++++++++++++++++++++--
> > >   tools/libxl/libxl_dm.c                  |    6 ++++
> > >   xen/include/public/hvm/hvm_xs_strings.h |    1 +
> > >   3 files changed, 54 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/tools/firmware/hvmloader/pci.c
> > > b/tools/firmware/hvmloader/pci.c
> > > index 3108c8a..2364177 100644
> > > --- a/tools/firmware/hvmloader/pci.c
> > > +++ b/tools/firmware/hvmloader/pci.c
> > > @@ -27,6 +27,8 @@
> > >     #include <xen/memory.h>
> > >   #include <xen/hvm/ioreq.h>
> > > +#include <xen/hvm/hvm_xs_strings.h>
> > > +#include <stdbool.h>
> > >     unsigned long pci_mem_start = PCI_MEM_START;
> > >   unsigned long pci_mem_end = PCI_MEM_END;
> > > @@ -57,6 +59,32 @@ void pci_setup(void)
> > >       } *bars = (struct bars *)scratch_start;
> > >       unsigned int i, nr_bars = 0;
> > >   +    const char *s;
> > > +    /*
> > > +     * Do we allow hvmloader to relocate guest memory in order
to
> > > +     * increase the size of the lowmem MMIO hole?  Defaulting to
1
> > > +     * here will mean that non-libxl toolstacks (including xend
and
> > > +     * home-grown ones) will experience this series as "no
change".
> > > +     * It does mean that those using qemu-xen will still
experience
> > > +     * the bug (described below); but it also means that those
using
> > > +     * qemu-traditional will *not* experience any change; and it
also
> > > +     * means that there is a work-around for those using
qemu-xen,
> > > +     * namely switching to qemu-traditional.
> > > +     *
> > > +     * If we defaulted to 0, and failing to resize the hole
caused any
> > > +     * problems with qemu-traditional, then there is no
work-around.
> > > +     *
> > > +     * Since xend can''t talk to qemu-traditional, I
think this is the
> > > +     * option that will have the least impact.
> > > +     */
> > > +    bool allow_memory_relocate = 1;
> > > +
> > > +    s = xenstore_read(HVM_XS_ALLOW_MEMORY_RELOCATE, NULL);
> > > +    if ( s )
> > > +        allow_memory_relocate = strtoll(s, NULL, 0);
> > Considering that strtoll retuns a long long, are we sure that this
> > allocation does what we want for all the possible long long values
> > that can be returned?
> > 
> > For example, if strtoll returns -1, do we want allow_memory_relocate
to
> > be set to true?
> 
> The only valid values here are "0" and "1"; everything
else is undefined.
This code doesn''t do what you say: "0" means false and
everything else
means true. The undefined values are treated as true. Is that what we
want?

In order to do what you say you would need:

bool allow_memory_relocate = 1;
int i;
i = strtoll(s, NULL, 0);
if (i == 0 || i == 1)
    allow_memory_relocate = i;
else
    printf("WARNING");

> Look, the bike shed is already painted, the brushes have been washed and
put
> away.  Leave it be. :-)
I am leaving strtoll be. I think it would be easier not to use to strtoll
but I don''t mind.
But if we do use strtoll then we need to handle all the possible return
values appropriately.

Ian Jackson

2013-Jun-21 11:21 UTC

head link

Re: [PATCH v4 6/8] hvmloader: Load large devices into high MMIO space as needed

George Dunlap writes ("[PATCH v4 6/8] hvmloader: Load large devices into
high MMIO space as needed"):> Keep track of how much mmio space is left total, as well as the amount
> of "low" MMIO space (<4GiB), and only load devices into high
memory if
> there is not enough low memory for the rest of the devices to fit.
> 
> Because devices are processed by size in order from large to small,
> this should preferentially relocate devices with large BARs to 64-bit
> space.
Does this have similar rounding/padding considerations as I discussed
in response to 5/8 ?  I think it probably does...

In fact this one is worse because you calculate whether the first
(biggest) device will fit without considering its rounding, and then
allocate it with rounding.

Ian.

Ian Jackson

2013-Jun-21 11:22 UTC

head link

Re: [PATCH v4 7/8] hvmloader: Remove minimum size for BARs to relocate to 64-bit space

George Dunlap writes ("[PATCH v4 7/8] hvmloader: Remove minimum size for
BARs to relocate to 64-bit space"):> Allow devices with BARs less than 512MiB to be relocated to high
> memory.
> 
> This will only be invoked if there is not enough low MMIO space to map
> the device, and will be done preferentially to large devices first; so
> in all likelihood only large devices will be remapped anyway.
> 
> This is needed to work-around the issue of qemu-xen not being able to
> handle moving guest memory around to resize the MMIO hole.  The
> default MMIO hole size is less than 256MiB.
Assuming a good answer to my responses to 5/8 and 6/8, this is fine
and in line with what your 0/8 claims.

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian.

Ian Jackson

2013-Jun-21 11:25 UTC

head link

Re: [PATCH v4 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

George Dunlap writes ("[PATCH v4 8/8] libxl,hvmloader: Don''t
relocate memory for MMIO hole"):> At the moment, qemu-xen can''t handle memory being relocated by
> hvmloader.  This may happen if a device with a large enough memory
> region is passed through to the guest.  At the moment, if this
> happens, then at some point in the future qemu will crash and the
> domain will hang.  (qemu-traditional is fine.)
> 
> It''s too late in the release to do a proper fix, so we try to do
> damage control.
> 
> hvmloader already has mechanisms to relocate memory to 64-bit space if
> it can''t make a big enough MMIO hole.  By default this is 2GiB; if
we
> just refuse to make the hole bigger if it will overlap with guest
> memory, then the relocation will happen by default.
I see you still haven''t changed it to use GCSPRINTF but I
don''t think
that''s worth arguing about right now.

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

Ian Jackson

2013-Jun-21 11:31 UTC

head link

Re: [PATCH v3 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

Stefano Stabellini writes ("Re: [PATCH v3 8/8] libxl,hvmloader:
Don''t relocate memory for MMIO hole"):> On Fri, 21 Jun 2013, George Dunlap wrote:
> > The only valid values here are "0" and "1";
everything else is undefined.
> 
> This code doesn''t do what you say: "0" means false and
everything else
> means true. The undefined values are treated as true. Is that what we
> want?
I think it''s acceptable.  Many other kernel-level and embedded
consumers of xenstore do similar things.  This is not a
security-relevant boundary and anyway the "wrong" behaviours are all
tolerable; furthermore hvmloader doesn''t have a particularly good way
to report errors.
> In order to do what you say you would need:
> 
> bool allow_memory_relocate = 1;
> int i;
> i = strtoll(s, NULL, 0);
No, you need to also
  - pass a non-NULL 2nd argument and check that on return it
    points to null (in case the string had nondigits in it)
  - check that the input string is not empty (eg by checking
    that the end pointer returned via the 2nd argument
    is not equal to the start of the string)
  - set errno to 0 beforehand and check it afterwards
    (in case of ERANGE) (and anyway I bet hvmloader''s strtoll
    gets this wrong so it''s probably pointless)

This is far too much faff and is ultimately pointless.

I think we should say "0" and "1" are currently defined and
other
integer values are reserved and should be treated the same way as "1".

Ian.

Jan Beulich

2013-Jun-21 12:58 UTC

head link

Re: [PATCH v4 5/8] hvmloader: Correct bug in low mmio region accounting

>>> On 21.06.13 at 13:19, Ian Jackson <Ian.Jackson@eu.citrix.com>
wrote:
> George Dunlap writes ("[PATCH v4 5/8] hvmloader: Correct bug in low
mmio
> region accounting"):
>> When deciding whether to map a device in low MMIO space (<4GiB),
>> hvmloader compares it with "mmio_left", which is set to the
size of
>> the low MMIO range (pci_mem_end - pci_mem_start).  However, even if it
>> does map a device in high MMIO space, it still removes the size of its
>> BAR from mmio_left.
>> 
>> In reality we don''t need to do a separate accounting of the
low memory
>> available -- this can be calculated from mem_resource.  Just get rid
>> of the variable and the duplicate accounting entirely.  This will make
>> the code more robust.
> ...
>> -        using_64bar = bars[i].is_64bar && bar64_relocate
&& (mmio_left < bar_sz);
>> +        using_64bar = bars[i].is_64bar && bar64_relocate
>> +            && (bar_sz > (mem_resource.max -
mem_resource.base));
> 
> This is not entirely straightforward I think.
> 
> The actual calculation about whether it will actually fit, rather than
> a precalculation of whether it is going to fit, is done here:
> 
>         base = (resource->base  + bar_sz - 1) & ~(uint64_t)(bar_sz -
1);
>         bar_data |= (uint32_t)base;
>         bar_data_upper = (uint32_t)(base >> 32);
>         base += bar_sz;
> 
>         if ( (base < resource->base) || (base > resource->max)
)
>             [ ... doesn''t fit ... ]
> 
> The first test rounds the base up to a multiple of bar_sz.  I assume
> that this is a requirement of the PCI spec.
> 
> (While I''m here I''ll note that the (uint64_t) cast in
that line is
> unneccessary and confusing.  If bar_sz weren''t 64-bit this code
would
> be quite wrong, and putting that cast there suggests that it might not
> be.)
> 
> In infer (from "bar_sz &= ~(bar_sz - 1)") that bar_sz is
supposed to
> be always a power of two.  And we have devices in descending order of
> size.  So at least after the first device, this rounding does nothing.
> 
> But for the first device I think it may be possible for resource->base
> not to be a multiple of the bar_sz, and in that case it might be that
> the precalculation thinks it will fit when the actual placement
> calculation doesn''t.
> 
> Do you think this is possible ?
This is possible only from an abstract perspective, not in reality:
PCI_MEM_START being 0x{f,e,c,8}0000000, PCI_MEM_END being
0xfc000000, and allocations starting with the biggest BARs
(where you already correctly noted that BARs are always a power
of 2 in size), the current base address can be misaligned only
when the BAR size is too large to fit anyway. In which case it''ll
go into the space above 4Gb, and to that range the precalculation
doesn''t apply.

Jan

Ian Jackson

2013-Jun-21 13:32 UTC

head link

Re: [PATCH v4 5/8] hvmloader: Correct bug in low mmio region accounting

Jan Beulich writes ("Re: [Xen-devel] [PATCH v4 5/8] hvmloader: Correct bug
in low mmio region accounting"):> On 21.06.13 at 13:19, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:
> > But for the first device I think it may be possible for
resource->base
> > not to be a multiple of the bar_sz, and in that case it might be that
> > the precalculation thinks it will fit when the actual placement
> > calculation doesn''t.
> > 
> > Do you think this is possible ?
> 
> This is possible only from an abstract perspective, not in reality:
> PCI_MEM_START being 0x{f,e,c,8}0000000, PCI_MEM_END being
> 0xfc000000, and allocations starting with the biggest BARs
> (where you already correctly noted that BARs are always a power
> of 2 in size), the current base address can be misaligned only
> when the BAR size is too large to fit anyway. In which case it''ll
> go into the space above 4Gb, and to that range the precalculation
> doesn''t apply.
Ah.  Right.  Err, OK.  I''m convinced by this argument.

It''s not a good reflection on the clarity of this code, though.
Perhaps, George, you could mention this issue in a comment or the
commit message.

But anyway, this, and 6/8,

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>

George Dunlap

2013-Jun-21 13:40 UTC

head link

Re: [PATCH v4 5/8] hvmloader: Correct bug in low mmio region accounting

On 21/06/13 14:32, Ian Jackson wrote:> Jan Beulich writes ("Re: [Xen-devel] [PATCH v4 5/8] hvmloader: Correct
bug in low mmio region accounting"):
>> On 21.06.13 at 13:19, Ian Jackson <Ian.Jackson@eu.citrix.com>
wrote:
>>> But for the first device I think it may be possible for
resource->base
>>> not to be a multiple of the bar_sz, and in that case it might be
that
>>> the precalculation thinks it will fit when the actual placement
>>> calculation doesn''t.
>>>
>>> Do you think this is possible ?
>> This is possible only from an abstract perspective, not in reality:
>> PCI_MEM_START being 0x{f,e,c,8}0000000, PCI_MEM_END being
>> 0xfc000000, and allocations starting with the biggest BARs
>> (where you already correctly noted that BARs are always a power
>> of 2 in size), the current base address can be misaligned only
>> when the BAR size is too large to fit anyway. In which case
it''ll
>> go into the space above 4Gb, and to that range the precalculation
>> doesn''t apply.
> Ah.  Right.  Err, OK.  I''m convinced by this argument.
>
> It''s not a good reflection on the clarity of this code, though.
> Perhaps, George, you could mention this issue in a comment or the
> commit message.
Yes, I think I shall.  It is, as Jan says, correct at the present 
moment, but it''s not even clear whether that was by accident or by 
design; even if it was by design, there''s no guarantee it will remain
so
in the future without at least a comment.

We may want to try to clean this up long-term, but I would really like 
to investigate just punting this whole thing off to SeaBIOS, which is 
being tested and maintained by the KVM folks.
> But anyway, this, and 6/8,
>
> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Great, thanks.

Stefano pointed out some "development process" terminology leaking
into
the comment on the last patch -- I''ll clean that up, add in some 
comments about the fragile accounting, and send v5.  That should be it 
for this series, I think.

  -George

Hao, Xudong

2013-Jun-26 10:08 UTC

head link

Re: [PATCH v4 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org
> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of George Dunlap
> Sent: Friday, June 21, 2013 6:47 PM
> To: xen-devel@lists.xen.org
> Cc: Keir Fraser; Ian Campbell; Hanweidong; George Dunlap; Stefano
Stabellini;
> Ian Jackson
> Subject: [Xen-devel] [PATCH v4 8/8] libxl, hvmloader: Don''t
relocate memory for
> MMIO hole
> 
> At the moment, qemu-xen can''t handle memory being relocated by
> hvmloader.  This may happen if a device with a large enough memory
> region is passed through to the guest.  At the moment, if this
> happens, then at some point in the future qemu will crash and the
> domain will hang.  (qemu-traditional is fine.)
> 
> It''s too late in the release to do a proper fix, so we try to do
> damage control.
> 
> hvmloader already has mechanisms to relocate memory to 64-bit space if
> it can''t make a big enough MMIO hole.  By default this is 2GiB; if
we
> just refuse to make the hole bigger if it will overlap with guest
> memory, then the relocation will happen by default.
> 
For qemu-xen use case, hvmloader start MMIO hole at 0xf0000000. However,
qemu-xen initialize ram region to 0xf0000000(HVM_BELOW_4G_RAM_END) below 4G, and
pci hole starting from 0xe0000000, is it overlap?

-thanks
Xudong

Stefano Stabellini

2013-Jun-26 13:36 UTC

head link

Re: [PATCH v4 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

On Wed, 26 Jun 2013, Hao, Xudong wrote:> > -----Original Message-----
> > From: xen-devel-bounces@lists.xen.org
> > [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of George Dunlap
> > Sent: Friday, June 21, 2013 6:47 PM
> > To: xen-devel@lists.xen.org
> > Cc: Keir Fraser; Ian Campbell; Hanweidong; George Dunlap; Stefano
Stabellini;
> > Ian Jackson
> > Subject: [Xen-devel] [PATCH v4 8/8] libxl, hvmloader: Don''t
relocate memory for
> > MMIO hole
> > 
> > At the moment, qemu-xen can''t handle memory being relocated
by
> > hvmloader.  This may happen if a device with a large enough memory
> > region is passed through to the guest.  At the moment, if this
> > happens, then at some point in the future qemu will crash and the
> > domain will hang.  (qemu-traditional is fine.)
> > 
> > It''s too late in the release to do a proper fix, so we try to
do
> > damage control.
> > 
> > hvmloader already has mechanisms to relocate memory to 64-bit space if
> > it can''t make a big enough MMIO hole.  By default this is
2GiB; if we
> > just refuse to make the hole bigger if it will overlap with guest
> > memory, then the relocation will happen by default.
> > 
> 
> For qemu-xen use case, hvmloader start MMIO hole at 0xf0000000. However,
qemu-xen initialize ram region to 0xf0000000(HVM_BELOW_4G_RAM_END) below 4G, and
pci hole starting from 0xe0000000, is it overlap?
hvmloader configures the MMIO hole to start at 0xf0000000, qemu-xen
configures the below_4g_mem_size ram region to *end* at 0xf0000000 and
the pci hole to start from 0xf0000000. It''s all coherent now.

The patch to modify the pci hole in qemu-xen and have it start at
0xe0000000 has been reverted.

Hao, Xudong

2013-Jun-26 14:23 UTC

head link

Re: [PATCH v4 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

> -----Original Message-----
> From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]
> Sent: Wednesday, June 26, 2013 9:36 PM
> To: Hao, Xudong
> Cc: George Dunlap; xen-devel@lists.xen.org; Keir Fraser; Ian Campbell;
> Hanweidong; Stefano Stabellini; Ian Jackson
> Subject: RE: [Xen-devel] [PATCH v4 8/8] libxl, hvmloader: Don''t
relocate memory
> for MMIO hole
> 
> On Wed, 26 Jun 2013, Hao, Xudong wrote:
> > > -----Original Message-----
> > > From: xen-devel-bounces@lists.xen.org
> > > [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of George
Dunlap
> > > Sent: Friday, June 21, 2013 6:47 PM
> > > To: xen-devel@lists.xen.org
> > > Cc: Keir Fraser; Ian Campbell; Hanweidong; George Dunlap; Stefano
> Stabellini;
> > > Ian Jackson
> > > Subject: [Xen-devel] [PATCH v4 8/8] libxl, hvmloader:
Don''t relocate memory
> for
> > > MMIO hole
> > >
> > > At the moment, qemu-xen can''t handle memory being
relocated by
> > > hvmloader.  This may happen if a device with a large enough
memory
> > > region is passed through to the guest.  At the moment, if this
> > > happens, then at some point in the future qemu will crash and the
> > > domain will hang.  (qemu-traditional is fine.)
> > >
> > > It''s too late in the release to do a proper fix, so we
try to do
> > > damage control.
> > >
> > > hvmloader already has mechanisms to relocate memory to 64-bit
space if
> > > it can''t make a big enough MMIO hole.  By default this
is 2GiB; if we
> > > just refuse to make the hole bigger if it will overlap with guest
> > > memory, then the relocation will happen by default.
> > >
> >
> > For qemu-xen use case, hvmloader start MMIO hole at 0xf0000000.
However,
> qemu-xen initialize ram region to 0xf0000000(HVM_BELOW_4G_RAM_END)
> below 4G, and pci hole starting from 0xe0000000, is it overlap?
> 
> hvmloader configures the MMIO hole to start at 0xf0000000, qemu-xen
> configures the below_4g_mem_size ram region to *end* at 0xf0000000 and
That''s right.
> the pci hole to start from 0xf0000000. It''s all coherent now.
> 
Current qemu upstream configure pci hole as below, do I miss something?

    if (ram_size >= 0xe0000000 ) {
        above_4g_mem_size = ram_size - 0xe0000000;
        below_4g_mem_size = 0xe0000000;
    } else {
        above_4g_mem_size = 0;
        below_4g_mem_size = ram_size;
    }

Thanks,
-Xudong
> The patch to modify the pci hole in qemu-xen and have it start at
> 0xe0000000 has been reverted.

Stefano Stabellini

2013-Jun-26 16:21 UTC

head link

Re: [PATCH v4 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

On Wed, 26 Jun 2013, Hao, Xudong wrote:> > -----Original Message-----
> > From: Stefano Stabellini [mailto:stefano.stabellini@eu.citrix.com]
> > Sent: Wednesday, June 26, 2013 9:36 PM
> > To: Hao, Xudong
> > Cc: George Dunlap; xen-devel@lists.xen.org; Keir Fraser; Ian Campbell;
> > Hanweidong; Stefano Stabellini; Ian Jackson
> > Subject: RE: [Xen-devel] [PATCH v4 8/8] libxl, hvmloader:
Don''t relocate memory
> > for MMIO hole
> > 
> > On Wed, 26 Jun 2013, Hao, Xudong wrote:
> > > > -----Original Message-----
> > > > From: xen-devel-bounces@lists.xen.org
> > > > [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of George
Dunlap
> > > > Sent: Friday, June 21, 2013 6:47 PM
> > > > To: xen-devel@lists.xen.org
> > > > Cc: Keir Fraser; Ian Campbell; Hanweidong; George Dunlap;
Stefano
> > Stabellini;
> > > > Ian Jackson
> > > > Subject: [Xen-devel] [PATCH v4 8/8] libxl, hvmloader:
Don''t relocate memory
> > for
> > > > MMIO hole
> > > >
> > > > At the moment, qemu-xen can''t handle memory being
relocated by
> > > > hvmloader.  This may happen if a device with a large enough
memory
> > > > region is passed through to the guest.  At the moment, if
this
> > > > happens, then at some point in the future qemu will crash
and the
> > > > domain will hang.  (qemu-traditional is fine.)
> > > >
> > > > It''s too late in the release to do a proper fix, so
we try to do
> > > > damage control.
> > > >
> > > > hvmloader already has mechanisms to relocate memory to
64-bit space if
> > > > it can''t make a big enough MMIO hole.  By default
this is 2GiB; if we
> > > > just refuse to make the hole bigger if it will overlap with
guest
> > > > memory, then the relocation will happen by default.
> > > >
> > >
> > > For qemu-xen use case, hvmloader start MMIO hole at 0xf0000000.
However,
> > qemu-xen initialize ram region to 0xf0000000(HVM_BELOW_4G_RAM_END)
> > below 4G, and pci hole starting from 0xe0000000, is it overlap?
> > 
> > hvmloader configures the MMIO hole to start at 0xf0000000, qemu-xen
> > configures the below_4g_mem_size ram region to *end* at 0xf0000000 and
> 
> That''s right.
> 
> > the pci hole to start from 0xf0000000. It''s all coherent now.
> > 
> 
> Current qemu upstream configure pci hole as below, do I miss something?
> 
>     if (ram_size >= 0xe0000000 ) {
>         above_4g_mem_size = ram_size - 0xe0000000;
>         below_4g_mem_size = 0xe0000000;
>     } else {
>         above_4g_mem_size = 0;
>         below_4g_mem_size = ram_size;
>     }
That''s the non-Xen case.
Give a look at xen-all.c:xen_ram_init:

    if (ram_size >= HVM_BELOW_4G_RAM_END) {
        above_4g_mem_size = ram_size - HVM_BELOW_4G_RAM_END;
        below_4g_mem_size = HVM_BELOW_4G_RAM_END;
    } else {
        below_4g_mem_size = ram_size;
    }

Xen devel - Jun 2013 - [PATCH 0/8] Relocate devices rather than memory for qemu-xen

[PATCH 0/8] Relocate devices rather than memory for qemu-xen

[PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments

[PATCH v3 2/8] hvmloader: Make the printfs more informative

[PATCH v3 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

[PATCH v3 4/8] hvmloader: Fix check for needing a 64-bit bar

[PATCH v3 5/8] hvmloader: Correct bug in low mmio region accounting

[PATCH v3 6/8] hvmloader: Load large devices into high MMIO space as needed

[PATCH v3 7/8] hvmloader: Remove minimum size for BARs to relocate to 64-bit space

[PATCH v3 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

Re: [PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments

Re: [PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments

Re: [PATCH v3 2/8] hvmloader: Make the printfs more informative

Re: [PATCH v3 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

Re: [PATCH v3 5/8] hvmloader: Correct bug in low mmio region accounting

Re: [PATCH v3 6/8] hvmloader: Load large devices into high MMIO space as needed

Re: [PATCH v3 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

Re: [PATCH v3 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

Re: [PATCH v3 5/8] hvmloader: Correct bug in low mmio region accounting

Re: [PATCH v3 6/8] hvmloader: Load large devices into high MMIO space as needed

Re: [PATCH v3 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

Re: [PATCH v3 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

Re: [PATCH v3 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

Re: [PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments

Re: [PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments

Re: [PATCH v3 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

Re: [PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments

Re: [PATCH v3 2/8] hvmloader: Make the printfs more informative

Re: [PATCH v3 2/8] hvmloader: Make the printfs more informative

Re: [PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments [and 1 more messages]

Re: [PATCH v3 1/8] hvmloader: Remove all 64-bit print arguments

Re: [PATCH v3 2/8] hvmloader: Make the printfs more informative

Re: [PATCH v3 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

[PATCH 0/8] Relocate devices rather than memory for qemu-xen

[PATCH v4 1/8] hvmloader: Remove all 64-bit print arguments

[PATCH v4 2/8] hvmloader: Make the printfs more informative

[PATCH v4 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

[PATCH v4 4/8] hvmloader: Fix check for needing a 64-bit bar

[PATCH v4 5/8] hvmloader: Correct bug in low mmio region accounting

[PATCH v4 6/8] hvmloader: Load large devices into high MMIO space as needed

[PATCH v4 7/8] hvmloader: Remove minimum size for BARs to relocate to 64-bit space

[PATCH v4 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

Re: [PATCH v4 1/8] hvmloader: Remove all 64-bit print arguments

Re: [PATCH v4 2/8] hvmloader: Make the printfs more informative

Re: [PATCH v4 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

Re: [PATCH v4 4/8] hvmloader: Fix check for needing a 64-bit bar

Re: [PATCH v3 2/8] hvmloader: Make the printfs more informative

Re: [PATCH v3 2/8] hvmloader: Make the printfs more informative

Re: [PATCH v4 1/8] hvmloader: Remove all 64-bit print arguments

Re: [PATCH v4 2/8] hvmloader: Make the printfs more informative

Re: [PATCH v4 3/8] hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G

Re: [PATCH v4 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

Re: [PATCH v4 5/8] hvmloader: Correct bug in low mmio region accounting

Re: [PATCH v4 1/8] hvmloader: Remove all 64-bit print arguments

Re: [PATCH v3 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

Re: [PATCH v4 6/8] hvmloader: Load large devices into high MMIO space as needed

Re: [PATCH v4 7/8] hvmloader: Remove minimum size for BARs to relocate to 64-bit space

Re: [PATCH v4 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

Re: [PATCH v3 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

Re: [PATCH v4 5/8] hvmloader: Correct bug in low mmio region accounting

Re: [PATCH v4 5/8] hvmloader: Correct bug in low mmio region accounting

Re: [PATCH v4 5/8] hvmloader: Correct bug in low mmio region accounting

Re: [PATCH v4 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

Re: [PATCH v4 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

Re: [PATCH v4 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole

Re: [PATCH v4 8/8] libxl, hvmloader: Don''t relocate memory for MMIO hole