This patch series concerns QEMU. Another serie has already came for Xen. As we discussed on Xen/QEMU mailing list (http://marc.info/?l=qemu-devel&m=133042969527515), I have worked on multiple QEMU support for one domain. QEMU must registered all IO ranges (MMIO and PIO) and PCI that it''s want''s to use to allow multiple QEMU. Each QEMU will handle a subset of the hardware. It will retrieve its configuration with XenStore. Both of these patch series (one for Xen, the other Xen) are not complete, and it breaks some parts of Xen ... The purpose of these series is to start a discussion on how to implement multiple ioreq server on Xen and QEMU. Julien Grall (6): option: Add -xen-dmid xen: Add functions to register PCI and IO in Xen memory: Add xen memory hook xen-pci: Register PCI in Xen xen-io: Handle the new ioreq type IOREQ_TYPE_PCI_CONFIG xen: handle qemu disaggregation exec.c | 9 ++ hw/pci.c | 6 ++ hw/xen.h | 4 + ioport.c | 17 ++++ qemu-options.hx | 2 + vl.c | 8 ++ xen-all.c | 231 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- xen-stub.c | 13 +++ 8 files changed, 287 insertions(+), 3 deletions(-) -- Julien Grall
With this option, QEMU knows it''s ID and can retrieve it''s configuration from XenStore. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- hw/xen.h | 1 + qemu-options.hx | 2 ++ vl.c | 8 ++++++++ 3 files changed, 11 insertions(+), 0 deletions(-) diff --git a/hw/xen.h b/hw/xen.h index b46879c..b056b13 100644 --- a/hw/xen.h +++ b/hw/xen.h @@ -18,6 +18,7 @@ enum xen_mode { }; extern uint32_t xen_domid; +extern uint32_t xen_dmid; extern enum xen_mode xen_mode; extern int xen_allowed; diff --git a/qemu-options.hx b/qemu-options.hx index daefce3..95c87f8 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2364,6 +2364,8 @@ DEF("xen-attach", 0, QEMU_OPTION_xen_attach, "-xen-attach attach to existing xen domain\n" " xend will use this when starting qemu\n", QEMU_ARCH_ALL) +DEF("xen-dmid", HAS_ARG, QEMU_OPTION_xen_dmid, + "-xen-dmid id specify the device model id for xenstore\n", QEMU_ARCH_ALL) STEXI @item -xen-domid @var{id} @findex -xen-domid diff --git a/vl.c b/vl.c index bd95539..36f2111 100644 --- a/vl.c +++ b/vl.c @@ -263,6 +263,7 @@ int kvm_allowed = 0; int xen_allowed = 0; uint32_t xen_domid; enum xen_mode xen_mode = XEN_EMULATE; +uint32_t xen_dmid = 0; static int tcg_tb_size; static int default_serial = 1; @@ -3132,6 +3133,13 @@ int main(int argc, char **argv, char **envp) } xen_mode = XEN_ATTACH; break; + case QEMU_OPTION_xen_dmid: + if (!(xen_available())) { + printf("Option %s not supported for this target\n", popt->name); + exit(1); + } + xen_dmid = atoi(optarg); + break; case QEMU_OPTION_trace: { opts = qemu_opts_parse(qemu_find_opts("trace"), optarg, 0); -- Julien Grall
Julien Grall
2012-Mar-22 16:01 UTC
[QEMU][RFC PATCH 2/6] xen: Add functions to register PCI and IO in Xen
Add interface for the new xen hypercalls Signed-off-by: Julien Grall <julien.grall@citrix.com> --- hw/xen.h | 3 +++ xen-all.c | 2 ++ xen-stub.c | 13 +++++++++++++ 3 files changed, 18 insertions(+), 0 deletions(-) diff --git a/hw/xen.h b/hw/xen.h index b056b13..a76616f 100644 --- a/hw/xen.h +++ b/hw/xen.h @@ -35,6 +35,9 @@ static inline int xen_enabled(void) int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num); void xen_piix3_set_irq(void *opaque, int irq_num, int level); void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int len); +int xen_register_pcidev(PCIDevice *pci_dev); +void xen_map_iorange(uint64_t addr, uint64_t size, int is_mmio); +void xen_unmap_iorange(uint64_t addr, uint64_t size, int is_mmio); void xen_cmos_set_s3_resume(void *opaque, int irq, int level); qemu_irq *xen_interrupt_controller_init(void); diff --git a/xen-all.c b/xen-all.c index 493112b..f007278 100644 --- a/xen-all.c +++ b/xen-all.c @@ -36,6 +36,8 @@ static MemoryRegion ram_memory, ram_640k, ram_lo, ram_hi; static MemoryRegion *framebuffer; +static unsigned int serverid; +static int is_running = 0; /* Compatibility with older version */ #if __XEN_LATEST_INTERFACE_VERSION__ < 0x0003020a diff --git a/xen-stub.c b/xen-stub.c index 9ea02d4..235640f 100644 --- a/xen-stub.c +++ b/xen-stub.c @@ -25,10 +25,23 @@ void xen_piix3_set_irq(void *opaque, int irq_num, int level) { } +int xen_register_pcidev(PCIDevice *pci_dev) +{ + return 1; +} + void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int len) { } +void xen_map_iorange(uint64_t addr, uint64_t size, int is_mmio) +{ +} + +void xen_unmap_iorange(uint64_t addr, uint64_t size, int is_mmio) +{ +} + void xen_cmos_set_s3_resume(void *opaque, int irq, int level) { } -- Julien Grall
QEMU will now register all memory range (PIO and MMIO) in Xen. We distinct two phases in memory registered : - initialization - running For all range registered during the initialization, QEMU will check with XenStore if it is authorized to use them. After the initialization, QEMU can register all range. Indeed, the new ranges will be for PCI Bar. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- exec.c | 9 ++++++ ioport.c | 17 ++++++++++++ xen-all.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 109 insertions(+), 0 deletions(-) diff --git a/exec.c b/exec.c index 780f63f..42d8c56 100644 --- a/exec.c +++ b/exec.c @@ -3557,12 +3557,21 @@ static void core_commit(MemoryListener *listener) static void core_region_add(MemoryListener *listener, MemoryRegionSection *section) { + if (xen_enabled()) { + xen_map_iorange(section->offset_within_address_space, + section->size, 1); + } + cpu_register_physical_memory_log(section, section->readonly); } static void core_region_del(MemoryListener *listener, MemoryRegionSection *section) { + if (xen_enabled()) { + xen_unmap_iorange(section->offset_within_address_space, + section->size, 1); + } } static void core_region_nop(MemoryListener *listener, diff --git a/ioport.c b/ioport.c index 78a3b89..073ed75 100644 --- a/ioport.c +++ b/ioport.c @@ -28,6 +28,7 @@ #include "ioport.h" #include "trace.h" #include "memory.h" +#include "hw/xen.h" /***********************************************************/ /* IO Port */ @@ -155,6 +156,11 @@ int register_ioport_read(pio_addr_t start, int length, int size, i); ioport_opaque[i] = opaque; } + + if (xen_enabled()) { + xen_map_iorange(start, length, 0); + } + return 0; } @@ -175,7 +181,13 @@ int register_ioport_write(pio_addr_t start, int length, int size, i); ioport_opaque[i] = opaque; } + + if (xen_enabled()) { + xen_map_iorange(start, length, 0); + } + return 0; + } static uint32_t ioport_readb_thunk(void *opaque, uint32_t addr) @@ -260,6 +272,11 @@ void isa_unassign_ioport(pio_addr_t start, int length) ioport_destructor_table[start](ioport_opaque[start]); ioport_destructor_table[start] = NULL; } + + if (xen_enabled()) { + xen_unmap_iorange(start, length, 0); + } + for(i = start; i < start + length; i++) { ioport_read_table[0][i] = NULL; ioport_read_table[1][i] = NULL; diff --git a/xen-all.c b/xen-all.c index f007278..366bafe 100644 --- a/xen-all.c +++ b/xen-all.c @@ -124,6 +124,89 @@ void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int len) } } +static void str_to_range(const char *str, uint64_t *begin, uint64_t *end) +{ + char *buf; + + /* We assume that range is valid */ + *begin = strtoll(str, &buf, 0); + if (buf[0] == ''\0'') { + *end = *begin; + return; + } + + str = buf + 1; + *end = strtoll(str, &buf, 0); +} + +static int check_range(uint64_t addr, uint64_t size, int is_mmio) +{ + struct xs_handle *xs = NULL; + char **dir; + char *str; + int rc = 1; + int i; + uint64_t begin; + uint64_t end; + char base[70]; + char path[70]; + unsigned int nb; + unsigned int len; + + xs = xs_open(0); + if (!xs) { + fprintf(stderr, "check_range: unable to open xenstore\n"); + return 1; + } + + snprintf(base, sizeof (base), "/local/domain/%u/image/dms/%u/%s", + xen_domid, xen_dmid, (is_mmio) ? "mmio" : "pio"); + dir = xs_directory(xs, XBT_NULL, base, &nb); + + if (dir) { + for (i = 0; i < nb; i++) { + snprintf(path, sizeof (path), "%s/%s", base, dir[i]); + str = xs_read(xs, XBT_NULL, path, &len); + str_to_range(str, &begin, &end); + free(str); + if (addr == begin && (addr + size - 1) == end) { + rc = 0; + break; + } + } + free(dir); + } + + return rc; +} + +void xen_map_iorange(uint64_t addr, uint64_t size, int is_mmio) +{ + static uint64_t previous_addr = ~0; + + /* Don''t register multiple times the same ioport */ + if (!is_mmio) { + if (addr == previous_addr) + return; + previous_addr = addr; + } + + if (!is_running) { + if (check_range(addr, size, is_mmio)) { + return; + } + } + + xc_hvm_map_io_range_to_ioreq_server(xen_xc, xen_domid, serverid, is_mmio, + addr, addr + size - 1); +} + +void xen_unmap_iorange(uint64_t addr, uint64_t size, int is_mmio) +{ + xc_hvm_unmap_io_range_from_ioreq_server(xen_xc, xen_domid, serverid, is_mmio, + addr); +} + static void xen_suspend_notifier(Notifier *notifier, void *data) { xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 3); -- Julien Grall
QEMU will now register PCI in Xen. It will usefull to forward IO config space to the right QEMU. Before to register a PCI device, QEMU will check with XenStore if it is autorized to register the PCI associate to a given BDF. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- hw/pci.c | 6 +++++ xen-all.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 76 insertions(+), 0 deletions(-) diff --git a/hw/pci.c b/hw/pci.c index bf046bf..4df4449 100644 --- a/hw/pci.c +++ b/hw/pci.c @@ -31,6 +31,7 @@ #include "loader.h" #include "range.h" #include "qmp-commands.h" +#include "xen.h" //#define DEBUG_PCI #ifdef DEBUG_PCI @@ -764,6 +765,11 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus, pci_dev->devfn = devfn; pstrcpy(pci_dev->name, sizeof(pci_dev->name), name); pci_dev->irq_state = 0; + + if (xen_enabled() && xen_register_pcidev(pci_dev)) { + return NULL; + } + pci_config_alloc(pci_dev); pci_config_set_vendor_id(pci_dev->config, pc->vendor_id); diff --git a/xen-all.c b/xen-all.c index 366bafe..2f5405c 100644 --- a/xen-all.c +++ b/xen-all.c @@ -107,6 +107,76 @@ void xen_piix3_set_irq(void *opaque, int irq_num, int level) irq_num & 3, level); } +static uint32_t str_to_bdf(const char *str) +{ + /* We assume that bdf is valid */ + char *buf; + uint32_t bdf; + uint32_t tmp; + + bdf = strtol(str, &buf, 16); + buf++; + + str = buf; + tmp = strtol(str, &buf, 16); + bdf = (bdf << 16) | (tmp << 11); + buf++; + + str = buf; + tmp = strtol(str, &buf, 16); + bdf = bdf | (tmp << 8); + + return bdf; +} + +int xen_register_pcidev(PCIDevice *pci_dev) +{ + struct xs_handle *xs = NULL; + uint32_t bdf = 0; + uint32_t allowed_bdf = 0; + char **dir; + char path[50]; + unsigned int nb; + unsigned int i; + unsigned int len; + char *str; + int rc = 0; + + + /* Fix : missing bus id to be more generic */ + bdf |= pci_dev->devfn << 8; + + xs = xs_open(0); + if (!xs) { + fprintf(stderr, "pci_register: Unable to open xenstore\n"); + return 1; + } + + snprintf(path, sizeof (path), "/local/domain/%u/image/dms/%u/pci", + xen_domid, xen_dmid); + + dir = xs_directory(xs, XBT_NULL, path, &nb); + if (dir) { + for (i = 0; i < nb; i++) { + snprintf(path, sizeof (path), "/local/domain/%u/image/dms/%u/pci/%s", + xen_domid, xen_dmid, dir[i]); + str = xs_read(xs, XBT_NULL, path, &len); + allowed_bdf = str_to_bdf(str); + free(str); + if (bdf == allowed_bdf) + { + rc = xc_hvm_register_pcidev(xen_xc, xen_domid, serverid, bdf); + break; + } + } + free(dir); + } + + xs_close(xs); + + return rc; +} + void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int len) { int i; -- Julien Grall
Julien Grall
2012-Mar-22 16:01 UTC
[QEMU][RFC PATCH 5/6] xen-io: Handle the new ioreq type IOREQ_TYPE_PCI_CONFIG
This ioreq type is introduced to handle easily the access to the PCI config space. Indeed, all PCI config spaces are access by the same IO ports (cf8 -> cff). Signed-off-by: Julien Grall <julien.grall@citrix.com> --- xen-all.c | 14 ++++++++++++++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/xen-all.c b/xen-all.c index 2f5405c..2d001b8 100644 --- a/xen-all.c +++ b/xen-all.c @@ -853,6 +853,17 @@ static void cpu_ioreq_move(ioreq_t *req) } } +static void cpu_ioreq_config_space(ioreq_t *req) +{ + uint64_t addr = req->addr; + uint64_t cf8 = req->addr & (~0x3); + + req->addr = 0xcfc + (addr & 0x3); + do_outp(0xcf8, 4, cf8); + cpu_ioreq_pio(req); + req->addr = addr; +} + static void handle_ioreq(ioreq_t *req) { if (!req->data_is_ptr && (req->dir == IOREQ_WRITE) && @@ -872,6 +883,9 @@ static void handle_ioreq(ioreq_t *req) case IOREQ_TYPE_INVALIDATE: xen_invalidate_map_cache(); break; + case IOREQ_TYPE_PCI_CONFIG: + cpu_ioreq_config_space(req); + break; default: hw_error("Invalid ioreq type 0x%x\n", req->type); } -- Julien Grall
* Register QEMU in Xen as server * Retrieve it''s own shared pages * Check if the page is already mapping before to populate Signed-off-by: Julien Grall <julien.grall@citrix.com> --- xen-all.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 59 insertions(+), 3 deletions(-) diff --git a/xen-all.c b/xen-all.c index 2d001b8..6b7acd7 100644 --- a/xen-all.c +++ b/xen-all.c @@ -61,6 +61,45 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu) } # define FMT_ioreq_size "u" #endif +#if __XEN_LATEST_INTERFACE_VERSION__ < 0x00040200 +static inline unsigned long xen_buffered_iopage() +{ + unsigned long pfn; + + xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_PFN, &pfn); + + return pfn; +} + +static inline unsigned long xen_iopage(void) +{ + unsigned long pfn; + + xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_PFN, &pfn); + + return pfn; +} +#else +static inline unsigned long xen_buffered_iopage(void) +{ + unsigned long pfn; + + xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IO_PFN_FIRST, &pfn); + pfn += (serverid - 1) * 2 + 2; + + return pfn; +} + +static inline unsigned long xen_iopage(void) +{ + unsigned long pfn; + + xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IO_PFN_FIRST, &pfn); + pfn += (serverid - 1) * 2 + 1; + + return pfn; +} +#endif #define BUFFER_IO_MAX_DELAY 100 @@ -349,6 +388,10 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr) return; } + if (xen_map_cache(ram_addr, size, 0)) { + return; + } + trace_xen_ram_alloc(ram_addr, size); nr_pfn = size >> TARGET_PAGE_BITS; @@ -1046,7 +1089,14 @@ static void xenstore_record_dm_state(struct xs_handle *xs, const char *state) exit(1); } - snprintf(path, sizeof (path), "/local/domain/0/device-model/%u/state", xen_domid); + if (!xen_dmid) { + snprintf(path, sizeof (path), "/local/domain/0/device-model/%u/state", xen_domid); + } + else { + snprintf(path, sizeof (path), "/local/domain/0/dms/%u/%u/state", + xen_domid, xen_dmid); + } + if (!xs_write(xs, XBT_NULL, path, state, strlen(state))) { fprintf(stderr, "error recording dm state\n"); exit(1); @@ -1077,6 +1127,7 @@ static void xen_change_state_handler(void *opaque, int running, RunState state) { if (running) { + is_running = 1; /* record state running */ xenstore_record_dm_state(xenstore, "running"); } @@ -1137,7 +1188,12 @@ int xen_hvm_init(void) state->suspend.notify = xen_suspend_notifier; qemu_register_suspend_notifier(&state->suspend); - xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN, &ioreq_pfn); + rc = xc_hvm_register_ioreq_server(xen_xc, xen_domid, &serverid); + + if (rc) + hw_error("registered server returned error %d", rc); + + ioreq_pfn = xen_iopage(); DPRINTF("shared page at pfn %lx\n", ioreq_pfn); state->shared_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE, PROT_READ|PROT_WRITE, ioreq_pfn); @@ -1146,7 +1202,7 @@ int xen_hvm_init(void) errno, xen_xc); } - xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn); + ioreq_pfn = xen_buffered_iopage(); DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn); state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE, PROT_READ|PROT_WRITE, ioreq_pfn); -- Julien Grall
On 2012-03-22 17:01, Julien Grall wrote:> With this option, QEMU knows it''s ID and can retrieve it''s configuration > from XenStore.Isn''t this better modeled as a (Xen) machine option? I''d like to avoid more "special" command line switch proliferation. Jan
On 2012-03-22 17:01, Julien Grall wrote:> QEMU will now register all memory range (PIO and MMIO) in Xen. > We distinct two phases in memory registered : > - initialization > - running > > For all range registered during the initialization, QEMU will > check with XenStore if it is authorized to use them. > After the initialization, QEMU can register all range. Indeed, > the new ranges will be for PCI Bar. > > Signed-off-by: Julien Grall <julien.grall@citrix.com> > --- > exec.c | 9 ++++++ > ioport.c | 17 ++++++++++++ > xen-all.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 109 insertions(+), 0 deletions(-) > > diff --git a/exec.c b/exec.c > index 780f63f..42d8c56 100644 > --- a/exec.c > +++ b/exec.c > @@ -3557,12 +3557,21 @@ static void core_commit(MemoryListener *listener) > static void core_region_add(MemoryListener *listener, > MemoryRegionSection *section) > { > + if (xen_enabled()) { > + xen_map_iorange(section->offset_within_address_space, > + section->size, 1); > + } > + > cpu_register_physical_memory_log(section, section->readonly); > } > > static void core_region_del(MemoryListener *listener, > MemoryRegionSection *section) > { > + if (xen_enabled()) { > + xen_unmap_iorange(section->offset_within_address_space, > + section->size, 1); > + } > }memory_listener_register(xen_io_hooks, system_memory)?> > static void core_region_nop(MemoryListener *listener, > diff --git a/ioport.c b/ioport.c > index 78a3b89..073ed75 100644 > --- a/ioport.c > +++ b/ioport.c > @@ -28,6 +28,7 @@ > #include "ioport.h" > #include "trace.h" > #include "memory.h" > +#include "hw/xen.h" > > /***********************************************************/ > /* IO Port */ > @@ -155,6 +156,11 @@ int register_ioport_read(pio_addr_t start, int length, int size, > i); > ioport_opaque[i] = opaque; > } > + > + if (xen_enabled()) { > + xen_map_iorange(start, length, 0); > + } > + > return 0; > } > > @@ -175,7 +181,13 @@ int register_ioport_write(pio_addr_t start, int length, int size, > i); > ioport_opaque[i] = opaque; > } > + > + if (xen_enabled()) { > + xen_map_iorange(start, length, 0); > + } > + > return 0; > + > } > > static uint32_t ioport_readb_thunk(void *opaque, uint32_t addr) > @@ -260,6 +272,11 @@ void isa_unassign_ioport(pio_addr_t start, int length) > ioport_destructor_table[start](ioport_opaque[start]); > ioport_destructor_table[start] = NULL; > } > + > + if (xen_enabled()) { > + xen_unmap_iorange(start, length, 0); > + } > + > for(i = start; i < start + length; i++) { > ioport_read_table[0][i] = NULL; > ioport_read_table[1][i] = NULL;memory_listener_register(xen_hooks, system_io)? Even if that is not yet powerful enough, tuning the hooks is usually better than open-coding. Jan
On 2012-03-22 17:01, Julien Grall wrote:> QEMU will now register PCI in Xen. It will usefull to forward > IO config space to the right QEMU. > > Before to register a PCI device, QEMU will check with XenStore if it is > autorized to register the PCI associate to a given BDF. > > Signed-off-by: Julien Grall <julien.grall@citrix.com> > --- > hw/pci.c | 6 +++++ > xen-all.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 76 insertions(+), 0 deletions(-) > > diff --git a/hw/pci.c b/hw/pci.c > index bf046bf..4df4449 100644 > --- a/hw/pci.c > +++ b/hw/pci.c > @@ -31,6 +31,7 @@ > #include "loader.h" > #include "range.h" > #include "qmp-commands.h" > +#include "xen.h" > > //#define DEBUG_PCI > #ifdef DEBUG_PCI > @@ -764,6 +765,11 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus, > pci_dev->devfn = devfn; > pstrcpy(pci_dev->name, sizeof(pci_dev->name), name); > pci_dev->irq_state = 0; > + > + if (xen_enabled() && xen_register_pcidev(pci_dev)) { > + return NULL; > + } > +That is doomed to break as QEMU evolves. What magical code is supposed to be above, what below this ha^Whook?> pci_config_alloc(pci_dev); > > pci_config_set_vendor_id(pci_dev->config, pc->vendor_id);Jan
Anthony Liguori
2012-Mar-22 19:58 UTC
Re: [QEMU][RFC PATCH 4/6] xen-pci: Register PCI in Xen
On 03/22/2012 11:01 AM, Julien Grall wrote:> QEMU will now register PCI in Xen. It will usefull to forward > IO config space to the right QEMU. > > Before to register a PCI device, QEMU will check with XenStore if it is > autorized to register the PCI associate to a given BDF. > > Signed-off-by: Julien Grall<julien.grall@citrix.com>As a general rule, any time you call to XenStore from QEMU, you''re doing something wrong. You should explicitly launch QEMU with the PCI devices that you want it to have. If there are platform devices you don''t want it to have, then you need to construct a machine that lacks those devices. Random hooks in non-Xen code that call into XenStore are always wrong. Regards, Anthony Liguori> --- > hw/pci.c | 6 +++++ > xen-all.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 76 insertions(+), 0 deletions(-) > > diff --git a/hw/pci.c b/hw/pci.c > index bf046bf..4df4449 100644 > --- a/hw/pci.c > +++ b/hw/pci.c > @@ -31,6 +31,7 @@ > #include "loader.h" > #include "range.h" > #include "qmp-commands.h" > +#include "xen.h" > > //#define DEBUG_PCI > #ifdef DEBUG_PCI > @@ -764,6 +765,11 @@ static PCIDevice *do_pci_register_device(PCIDevice *pci_dev, PCIBus *bus, > pci_dev->devfn = devfn; > pstrcpy(pci_dev->name, sizeof(pci_dev->name), name); > pci_dev->irq_state = 0; > + > + if (xen_enabled()&& xen_register_pcidev(pci_dev)) { > + return NULL; > + } > + > pci_config_alloc(pci_dev); > > pci_config_set_vendor_id(pci_dev->config, pc->vendor_id); > diff --git a/xen-all.c b/xen-all.c > index 366bafe..2f5405c 100644 > --- a/xen-all.c > +++ b/xen-all.c > @@ -107,6 +107,76 @@ void xen_piix3_set_irq(void *opaque, int irq_num, int level) > irq_num& 3, level); > } > > +static uint32_t str_to_bdf(const char *str) > +{ > + /* We assume that bdf is valid */ > + char *buf; > + uint32_t bdf; > + uint32_t tmp; > + > + bdf = strtol(str,&buf, 16); > + buf++; > + > + str = buf; > + tmp = strtol(str,&buf, 16); > + bdf = (bdf<< 16) | (tmp<< 11); > + buf++; > + > + str = buf; > + tmp = strtol(str,&buf, 16); > + bdf = bdf | (tmp<< 8); > + > + return bdf; > +} > + > +int xen_register_pcidev(PCIDevice *pci_dev) > +{ > + struct xs_handle *xs = NULL; > + uint32_t bdf = 0; > + uint32_t allowed_bdf = 0; > + char **dir; > + char path[50]; > + unsigned int nb; > + unsigned int i; > + unsigned int len; > + char *str; > + int rc = 0; > + > + > + /* Fix : missing bus id to be more generic */ > + bdf |= pci_dev->devfn<< 8; > + > + xs = xs_open(0); > + if (!xs) { > + fprintf(stderr, "pci_register: Unable to open xenstore\n"); > + return 1; > + } > + > + snprintf(path, sizeof (path), "/local/domain/%u/image/dms/%u/pci", > + xen_domid, xen_dmid); > + > + dir = xs_directory(xs, XBT_NULL, path,&nb); > + if (dir) { > + for (i = 0; i< nb; i++) { > + snprintf(path, sizeof (path), "/local/domain/%u/image/dms/%u/pci/%s", > + xen_domid, xen_dmid, dir[i]); > + str = xs_read(xs, XBT_NULL, path,&len); > + allowed_bdf = str_to_bdf(str); > + free(str); > + if (bdf == allowed_bdf) > + { > + rc = xc_hvm_register_pcidev(xen_xc, xen_domid, serverid, bdf); > + break; > + } > + } > + free(dir); > + } > + > + xs_close(xs); > + > + return rc; > +} > + > void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int len) > { > int i;
On Thu, 22 Mar 2012, Jan Kiszka wrote:> On 2012-03-22 17:01, Julien Grall wrote: > > With this option, QEMU knows it''s ID and can retrieve it''s configuration > > from XenStore. > > Isn''t this better modeled as a (Xen) machine option? I''d like to avoid > more "special" command line switch proliferation.I agree.
Stefano Stabellini
2012-Mar-23 10:44 UTC
Re: [QEMU][RFC PATCH 2/6] xen: Add functions to register PCI and IO in Xen
On Thu, 22 Mar 2012, Julien Grall wrote:> Add interface for the new xen hypercalls > > Signed-off-by: Julien Grall <julien.grall@citrix.com> > --- > hw/xen.h | 3 +++ > xen-all.c | 2 ++ > xen-stub.c | 13 +++++++++++++ > 3 files changed, 18 insertions(+), 0 deletions(-) > > diff --git a/hw/xen.h b/hw/xen.h > index b056b13..a76616f 100644 > --- a/hw/xen.h > +++ b/hw/xen.h > @@ -35,6 +35,9 @@ static inline int xen_enabled(void) > int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num); > void xen_piix3_set_irq(void *opaque, int irq_num, int level); > void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int len); > +int xen_register_pcidev(PCIDevice *pci_dev); > +void xen_map_iorange(uint64_t addr, uint64_t size, int is_mmio); > +void xen_unmap_iorange(uint64_t addr, uint64_t size, int is_mmio); > void xen_cmos_set_s3_resume(void *opaque, int irq, int level); > > qemu_irq *xen_interrupt_controller_init(void);you should add to this patch the Xen version of these functions too, otherwise it will break Xen builds.
Stefano Stabellini
2012-Mar-23 11:02 UTC
Re: [QEMU][RFC PATCH 4/6] xen-pci: Register PCI in Xen
On Thu, 22 Mar 2012, Anthony Liguori wrote:> On 03/22/2012 11:01 AM, Julien Grall wrote: > > QEMU will now register PCI in Xen. It will usefull to forward > > IO config space to the right QEMU. > > > > Before to register a PCI device, QEMU will check with XenStore if it is > > autorized to register the PCI associate to a given BDF. > > > > Signed-off-by: Julien Grall<julien.grall@citrix.com> > > As a general rule, any time you call to XenStore from QEMU, you''re doing > something wrong. > > You should explicitly launch QEMU with the PCI devices that you want it to have. > If there are platform devices you don''t want it to have, then you need to > construct a machine that lacks those devices. > > Random hooks in non-Xen code that call into XenStore are always wrong.Maybe the best thing to do is to have a set of machine specific options to select what devices need to be built in the machine. Most devices already can be dynamically selected: NICs, usb, acpi, cirrus, etc. We already have a Xen machine (xenfv_machine) that uses pc_init1 to initialize it. We could have a simple bitmask to determine what devices need to be enabled, then in pc_init1 we could have something like: if (devices & VGA_ENABLE) { pc_vga_init(); } Given the number of enable variable already present in the code (pci_enabled, acpi_enabled, usb_enabled, xen_enabled, cirrus_vga_enabled, ...), it might end up actually making the code more readable. The flexibility could end up being useful in the generic case as well, for testing if nothing else. We would still need to call xc_hvm_register_pcidev to register PCI devices in Xen, but we wouldn''t expect to fail unless there was a misconfiguration somewhere. In that case we could just exit. Now the problem is: there isn''t a simple way to specify the BDF where you want to create the device; pci_create_simple takes a devfn but most of the higher level functions (pc_vga_init, pci_nic_init_nofail, ...) don''t export the parameter at the moment. We would need to be able to tell pc_vga_init where to create the card, so we would have to export the devfn as a parameter.
Stefano Stabellini
2012-Mar-23 11:07 UTC
Re: [QEMU][RFC PATCH 6/6] xen: handle qemu disaggregation
On Thu, 22 Mar 2012, Julien Grall wrote:> * Register QEMU in Xen as server > * Retrieve it''s own shared pages > * Check if the page is already mapping before to populate > > Signed-off-by: Julien Grall <julien.grall@citrix.com> > --- > xen-all.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- > 1 files changed, 59 insertions(+), 3 deletions(-) > > diff --git a/xen-all.c b/xen-all.c > index 2d001b8..6b7acd7 100644 > --- a/xen-all.c > +++ b/xen-all.c > @@ -61,6 +61,45 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu) > } > # define FMT_ioreq_size "u" > #endif > +#if __XEN_LATEST_INTERFACE_VERSION__ < 0x00040200At this point, given that we are close to Xen 4.2 I would make this if __XEN_LATEST_INTERFACE_VERSION__ < 0x00040300> +static inline unsigned long xen_buffered_iopage() > +{ > + unsigned long pfn; > + > + xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_PFN, &pfn); > + > + return pfn; > +} > + > +static inline unsigned long xen_iopage(void) > +{ > + unsigned long pfn; > + > + xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_PFN, &pfn); > + > + return pfn; > +} > +#else > +static inline unsigned long xen_buffered_iopage(void) > +{ > + unsigned long pfn; > + > + xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IO_PFN_FIRST, &pfn); > + pfn += (serverid - 1) * 2 + 2; > + return pfn; > +} > + > +static inline unsigned long xen_iopage(void) > +{ > + unsigned long pfn; > + > + xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IO_PFN_FIRST, &pfn); > + pfn += (serverid - 1) * 2 + 1; > + > + return pfn;Shouldn''t these be pfn += serverid * 2; and pfn += serverid * 2 + 1; Are you numbering serverid starting from 1?> +#endif > > #define BUFFER_IO_MAX_DELAY 100 > > @@ -349,6 +388,10 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, MemoryRegion *mr) > return; > } > > + if (xen_map_cache(ram_addr, size, 0)) { > + return; > + }why?> trace_xen_ram_alloc(ram_addr, size); > > nr_pfn = size >> TARGET_PAGE_BITS; > @@ -1046,7 +1089,14 @@ static void xenstore_record_dm_state(struct xs_handle *xs, const char *state) > exit(1); > } > > - snprintf(path, sizeof (path), "/local/domain/0/device-model/%u/state", xen_domid); > + if (!xen_dmid) { > + snprintf(path, sizeof (path), "/local/domain/0/device-model/%u/state", xen_domid); > + } > + else { > + snprintf(path, sizeof (path), "/local/domain/0/dms/%u/%u/state", > + xen_domid, xen_dmid); > + } > + > if (!xs_write(xs, XBT_NULL, path, state, strlen(state))) { > fprintf(stderr, "error recording dm state\n"); > exit(1); > @@ -1077,6 +1127,7 @@ static void xen_change_state_handler(void *opaque, int running, > RunState state) > { > if (running) { > + is_running = 1; > /* record state running */ > xenstore_record_dm_state(xenstore, "running"); > } > @@ -1137,7 +1188,12 @@ int xen_hvm_init(void) > state->suspend.notify = xen_suspend_notifier; > qemu_register_suspend_notifier(&state->suspend); > > - xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_IOREQ_PFN, &ioreq_pfn); > + rc = xc_hvm_register_ioreq_server(xen_xc, xen_domid, &serverid); > + > + if (rc) > + hw_error("registered server returned error %d", rc); > + > + ioreq_pfn = xen_iopage(); > DPRINTF("shared page at pfn %lx\n", ioreq_pfn); > state->shared_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE, > PROT_READ|PROT_WRITE, ioreq_pfn); > @@ -1146,7 +1202,7 @@ int xen_hvm_init(void) > errno, xen_xc); > } > > - xc_get_hvm_param(xen_xc, xen_domid, HVM_PARAM_BUFIOREQ_PFN, &ioreq_pfn); > + ioreq_pfn = xen_buffered_iopage(); > DPRINTF("buffered io page at pfn %lx\n", ioreq_pfn); > state->buffered_io_page = xc_map_foreign_range(xen_xc, xen_domid, XC_PAGE_SIZE, > PROT_READ|PROT_WRITE, ioreq_pfn); > -- > Julien Grall >
On 03/22/2012 05:44 PM, Jan Kiszka wrote:>> >> static void core_region_nop(MemoryListener *listener, >> diff --git a/ioport.c b/ioport.c >> index 78a3b89..073ed75 100644 >> --- a/ioport.c >> +++ b/ioport.c >> @@ -28,6 +28,7 @@ >> #include "ioport.h" >> #include "trace.h" >> #include "memory.h" >> +#include "hw/xen.h" >> >> /***********************************************************/ >> /* IO Port */ >> @@ -155,6 +156,11 @@ int register_ioport_read(pio_addr_t start, int length, int size, >> i); >> ioport_opaque[i] = opaque; >> } >> + >> + if (xen_enabled()) { >> + xen_map_iorange(start, length, 0); >> + } >> + >> return 0; >> } >> >> @@ -175,7 +181,13 @@ int register_ioport_write(pio_addr_t start, int length, int size, >> i); >> ioport_opaque[i] = opaque; >> } >> + >> + if (xen_enabled()) { >> + xen_map_iorange(start, length, 0); >> + } >> + >> return 0; >> + >> } >> >> static uint32_t ioport_readb_thunk(void *opaque, uint32_t addr) >> @@ -260,6 +272,11 @@ void isa_unassign_ioport(pio_addr_t start, int length) >> ioport_destructor_table[start](ioport_opaque[start]); >> ioport_destructor_table[start] = NULL; >> } >> + >> + if (xen_enabled()) { >> + xen_unmap_iorange(start, length, 0); >> + } >> + >> for(i = start; i< start + length; i++) { >> ioport_read_table[0][i] = NULL; >> ioport_read_table[1][i] = NULL; >> > memory_listener_register(xen_hooks, system_io)? >QEMU doesn''t seem to call region_add/region_del for ioport. Moreover, some of ioport are directly register without using memory hook (for example cirrus vga). What is the best way to do it ?> Even if that is not yet powerful enough, tuning the hooks is usually > better than open-coding. > > Jan > >
On 2012-03-23 16:08, Julien Grall wrote:> On 03/22/2012 05:44 PM, Jan Kiszka wrote: >>> >>> static void core_region_nop(MemoryListener *listener, >>> diff --git a/ioport.c b/ioport.c >>> index 78a3b89..073ed75 100644 >>> --- a/ioport.c >>> +++ b/ioport.c >>> @@ -28,6 +28,7 @@ >>> #include "ioport.h" >>> #include "trace.h" >>> #include "memory.h" >>> +#include "hw/xen.h" >>> >>> /***********************************************************/ >>> /* IO Port */ >>> @@ -155,6 +156,11 @@ int register_ioport_read(pio_addr_t start, int >>> length, int size, >>> i); >>> ioport_opaque[i] = opaque; >>> } >>> + >>> + if (xen_enabled()) { >>> + xen_map_iorange(start, length, 0); >>> + } >>> + >>> return 0; >>> } >>> >>> @@ -175,7 +181,13 @@ int register_ioport_write(pio_addr_t start, int >>> length, int size, >>> i); >>> ioport_opaque[i] = opaque; >>> } >>> + >>> + if (xen_enabled()) { >>> + xen_map_iorange(start, length, 0); >>> + } >>> + >>> return 0; >>> + >>> } >>> >>> static uint32_t ioport_readb_thunk(void *opaque, uint32_t addr) >>> @@ -260,6 +272,11 @@ void isa_unassign_ioport(pio_addr_t start, int >>> length) >>> ioport_destructor_table[start](ioport_opaque[start]); >>> ioport_destructor_table[start] = NULL; >>> } >>> + >>> + if (xen_enabled()) { >>> + xen_unmap_iorange(start, length, 0); >>> + } >>> + >>> for(i = start; i< start + length; i++) { >>> ioport_read_table[0][i] = NULL; >>> ioport_read_table[1][i] = NULL; >>> >> memory_listener_register(xen_hooks, system_io)? >> > QEMU doesn''t seem to call region_add/region_del for ioport. > Moreover, some of ioport are directly register without > using memory hook (for example cirrus vga). > > What is the best way to do it ?I haven''t looked at details. Maybe it is just a combination of "use case not yet considered, but can easily be added" and "need to switch legacy code to new scheme". Then this still remains the better option than this hook. Avi? Jan
Anthony Liguori
2012-Mar-23 16:47 UTC
Re: [QEMU][RFC PATCH 3/6] memory: Add xen memory hook
On 03/22/2012 11:01 AM, Julien Grall wrote:> QEMU will now register all memory range (PIO and MMIO) in Xen. > We distinct two phases in memory registered : > - initialization > - running > > For all range registered during the initialization, QEMU will > check with XenStore if it is authorized to use them. > After the initialization, QEMU can register all range. Indeed, > the new ranges will be for PCI Bar. > > Signed-off-by: Julien Grall<julien.grall@citrix.com> > --- > exec.c | 9 ++++++ > ioport.c | 17 ++++++++++++ > xen-all.c | 83 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 109 insertions(+), 0 deletions(-) > > diff --git a/exec.c b/exec.c > index 780f63f..42d8c56 100644 > --- a/exec.c > +++ b/exec.c > @@ -3557,12 +3557,21 @@ static void core_commit(MemoryListener *listener) > static void core_region_add(MemoryListener *listener, > MemoryRegionSection *section) > { > + if (xen_enabled()) { > + xen_map_iorange(section->offset_within_address_space, > + section->size, 1); > + } > + > cpu_register_physical_memory_log(section, section->readonly); > } > > static void core_region_del(MemoryListener *listener, > MemoryRegionSection *section) > { > + if (xen_enabled()) { > + xen_unmap_iorange(section->offset_within_address_space, > + section->size, 1); > + } > } > > static void core_region_nop(MemoryListener *listener, > diff --git a/ioport.c b/ioport.c > index 78a3b89..073ed75 100644 > --- a/ioport.c > +++ b/ioport.c > @@ -28,6 +28,7 @@ > #include "ioport.h" > #include "trace.h" > #include "memory.h" > +#include "hw/xen.h" > > /***********************************************************/ > /* IO Port */ > @@ -155,6 +156,11 @@ int register_ioport_read(pio_addr_t start, int length, int size, > i); > ioport_opaque[i] = opaque; > } > + > + if (xen_enabled()) { > + xen_map_iorange(start, length, 0); > + } > + > return 0; > } > > @@ -175,7 +181,13 @@ int register_ioport_write(pio_addr_t start, int length, int size, > i); > ioport_opaque[i] = opaque; > } > + > + if (xen_enabled()) { > + xen_map_iorange(start, length, 0); > + } > + > return 0; > + > }This is the opposite direction we need to head. I really don''t think this series is the right way to handle things. I don''t want to see random hooks throughout QEMU to intercept for APIs affectively disabling large chunks of QEMU in the process. You should look at (1) creating only the devices you want (2) use a clean interface to interact with those devices. That would mean having a Xen specific AddressSpaceOps for ioports or something like that. Not having hooks in areas of code like this. Regards, Anthony Liguori
On 03/23/2012 06:37 PM, Jan Kiszka wrote:> On 2012-03-23 16:08, Julien Grall wrote: > > On 03/22/2012 05:44 PM, Jan Kiszka wrote: > >>> > >>> static void core_region_nop(MemoryListener *listener, > >>> diff --git a/ioport.c b/ioport.c > >>> index 78a3b89..073ed75 100644 > >>> --- a/ioport.c > >>> +++ b/ioport.c > >>> @@ -28,6 +28,7 @@ > >>> #include "ioport.h" > >>> #include "trace.h" > >>> #include "memory.h" > >>> +#include "hw/xen.h" > >>> > >>> /***********************************************************/ > >>> /* IO Port */ > >>> @@ -155,6 +156,11 @@ int register_ioport_read(pio_addr_t start, int > >>> length, int size, > >>> i); > >>> ioport_opaque[i] = opaque; > >>> } > >>> + > >>> + if (xen_enabled()) { > >>> + xen_map_iorange(start, length, 0); > >>> + } > >>> + > >>> return 0; > >>> } > >>> > >>> @@ -175,7 +181,13 @@ int register_ioport_write(pio_addr_t start, int > >>> length, int size, > >>> i); > >>> ioport_opaque[i] = opaque; > >>> } > >>> + > >>> + if (xen_enabled()) { > >>> + xen_map_iorange(start, length, 0); > >>> + } > >>> + > >>> return 0; > >>> + > >>> } > >>> > >>> static uint32_t ioport_readb_thunk(void *opaque, uint32_t addr) > >>> @@ -260,6 +272,11 @@ void isa_unassign_ioport(pio_addr_t start, int > >>> length) > >>> ioport_destructor_table[start](ioport_opaque[start]); > >>> ioport_destructor_table[start] = NULL; > >>> } > >>> + > >>> + if (xen_enabled()) { > >>> + xen_unmap_iorange(start, length, 0); > >>> + } > >>> + > >>> for(i = start; i< start + length; i++) { > >>> ioport_read_table[0][i] = NULL; > >>> ioport_read_table[1][i] = NULL; > >>> > >> memory_listener_register(xen_hooks, system_io)? > >> > > QEMU doesn''t seem to call region_add/region_del for ioport. > > Moreover, some of ioport are directly register without > > using memory hook (for example cirrus vga). > > > > What is the best way to do it ? > > I haven''t looked at details. Maybe it is just a combination of "use case > not yet considered, but can easily be added" and "need to switch legacy > code to new scheme". Then this still remains the better option than this > hook. Avi?Just the second - region_add/del will be called, but only for ioports registered via the MemoryRegion APIs. -- error compiling committee.c: too many arguments to function
On 03/23/2012 01:02 PM, Stefano Stabellini wrote:> Maybe the best thing to do is to have a set of machine specific options > to select what devices need to be built in the machine. > Most devices already can be dynamically selected: NICs, usb, acpi, > cirrus, etc. > We already have a Xen machine (xenfv_machine) that uses pc_init1 to > initialize it. > We could have a simple bitmask to determine what devices need to be > enabled, then in pc_init1 we could have something like: > > if (devices & VGA_ENABLE) { > pc_vga_init(); > } > > Given the number of enable variable already present in the code > (pci_enabled, acpi_enabled, usb_enabled, xen_enabled, > cirrus_vga_enabled, ...), it might end up actually making the code more > readable. The flexibility could end up being useful in the generic case > as well, for testing if nothing else. > > We would still need to call xc_hvm_register_pcidev to register PCI > devices in Xen, but we wouldn''t expect to fail unless there was a > misconfiguration somewhere. In that case we could just exit. > > > > Now the problem is: there isn''t a simple way to specify the BDF where > you want to create the device; pci_create_simple takes a devfn but most > of the higher level functions (pc_vga_init, pci_nic_init_nofail, ...) > don''t export the parameter at the moment. > We would need to be able to tell pc_vga_init where to create the card, > so we would have to export the devfn as a parameter. >You already have total flexibility with the -device foo parameter. It allows you to create any device, anywhere, with whatever configuration you want. Use in conjunction with -nodefconfig. You may want your own host/pci bridge that lacks the device 0 configuration space. -- error compiling committee.c: too many arguments to function
Stefano Stabellini
2012-Mar-26 11:01 UTC
Re: [QEMU][RFC PATCH 3/6] memory: Add xen memory hook
On Sun, 25 Mar 2012, Avi Kivity wrote:> On 03/23/2012 06:37 PM, Jan Kiszka wrote: > > On 2012-03-23 16:08, Julien Grall wrote: > > > On 03/22/2012 05:44 PM, Jan Kiszka wrote: > > >>> > > >>> static void core_region_nop(MemoryListener *listener, > > >>> diff --git a/ioport.c b/ioport.c > > >>> index 78a3b89..073ed75 100644 > > >>> --- a/ioport.c > > >>> +++ b/ioport.c > > >>> @@ -28,6 +28,7 @@ > > >>> #include "ioport.h" > > >>> #include "trace.h" > > >>> #include "memory.h" > > >>> +#include "hw/xen.h" > > >>> > > >>> /***********************************************************/ > > >>> /* IO Port */ > > >>> @@ -155,6 +156,11 @@ int register_ioport_read(pio_addr_t start, int > > >>> length, int size, > > >>> i); > > >>> ioport_opaque[i] = opaque; > > >>> } > > >>> + > > >>> + if (xen_enabled()) { > > >>> + xen_map_iorange(start, length, 0); > > >>> + } > > >>> + > > >>> return 0; > > >>> } > > >>> > > >>> @@ -175,7 +181,13 @@ int register_ioport_write(pio_addr_t start, int > > >>> length, int size, > > >>> i); > > >>> ioport_opaque[i] = opaque; > > >>> } > > >>> + > > >>> + if (xen_enabled()) { > > >>> + xen_map_iorange(start, length, 0); > > >>> + } > > >>> + > > >>> return 0; > > >>> + > > >>> } > > >>> > > >>> static uint32_t ioport_readb_thunk(void *opaque, uint32_t addr) > > >>> @@ -260,6 +272,11 @@ void isa_unassign_ioport(pio_addr_t start, int > > >>> length) > > >>> ioport_destructor_table[start](ioport_opaque[start]); > > >>> ioport_destructor_table[start] = NULL; > > >>> } > > >>> + > > >>> + if (xen_enabled()) { > > >>> + xen_unmap_iorange(start, length, 0); > > >>> + } > > >>> + > > >>> for(i = start; i< start + length; i++) { > > >>> ioport_read_table[0][i] = NULL; > > >>> ioport_read_table[1][i] = NULL; > > >>> > > >> memory_listener_register(xen_hooks, system_io)? > > >> > > > QEMU doesn''t seem to call region_add/region_del for ioport. > > > Moreover, some of ioport are directly register without > > > using memory hook (for example cirrus vga). > > > > > > What is the best way to do it ? > > > > I haven''t looked at details. Maybe it is just a combination of "use case > > not yet considered, but can easily be added" and "need to switch legacy > > code to new scheme". Then this still remains the better option than this > > hook. Avi? > > Just the second - region_add/del will be called, but only for ioports > registered via the MemoryRegion APIs.It looks like there are quite a few register_ioport_read/write left around, especially in the following files: hw/acpi_piix4.c hw/cirrus_vga.c hw/serial.c hw/pckbd.c hw/pc.c I guess they should all be converted to memory_region_init_io, right?
On 03/26/2012 01:01 PM, Stefano Stabellini wrote:> On Sun, 25 Mar 2012, Avi Kivity wrote: > > On 03/23/2012 06:37 PM, Jan Kiszka wrote: > > > On 2012-03-23 16:08, Julien Grall wrote: > > > > On 03/22/2012 05:44 PM, Jan Kiszka wrote: > > > >>> > > > >>> static void core_region_nop(MemoryListener *listener, > > > >>> diff --git a/ioport.c b/ioport.c > > > >>> index 78a3b89..073ed75 100644 > > > >>> --- a/ioport.c > > > >>> +++ b/ioport.c > > > >>> @@ -28,6 +28,7 @@ > > > >>> #include "ioport.h" > > > >>> #include "trace.h" > > > >>> #include "memory.h" > > > >>> +#include "hw/xen.h" > > > >>> > > > >>> /***********************************************************/ > > > >>> /* IO Port */ > > > >>> @@ -155,6 +156,11 @@ int register_ioport_read(pio_addr_t start, int > > > >>> length, int size, > > > >>> i); > > > >>> ioport_opaque[i] = opaque; > > > >>> } > > > >>> + > > > >>> + if (xen_enabled()) { > > > >>> + xen_map_iorange(start, length, 0); > > > >>> + } > > > >>> + > > > >>> return 0; > > > >>> } > > > >>> > > > >>> @@ -175,7 +181,13 @@ int register_ioport_write(pio_addr_t start, int > > > >>> length, int size, > > > >>> i); > > > >>> ioport_opaque[i] = opaque; > > > >>> } > > > >>> + > > > >>> + if (xen_enabled()) { > > > >>> + xen_map_iorange(start, length, 0); > > > >>> + } > > > >>> + > > > >>> return 0; > > > >>> + > > > >>> } > > > >>> > > > >>> static uint32_t ioport_readb_thunk(void *opaque, uint32_t addr) > > > >>> @@ -260,6 +272,11 @@ void isa_unassign_ioport(pio_addr_t start, int > > > >>> length) > > > >>> ioport_destructor_table[start](ioport_opaque[start]); > > > >>> ioport_destructor_table[start] = NULL; > > > >>> } > > > >>> + > > > >>> + if (xen_enabled()) { > > > >>> + xen_unmap_iorange(start, length, 0); > > > >>> + } > > > >>> + > > > >>> for(i = start; i< start + length; i++) { > > > >>> ioport_read_table[0][i] = NULL; > > > >>> ioport_read_table[1][i] = NULL; > > > >>> > > > >> memory_listener_register(xen_hooks, system_io)? > > > >> > > > > QEMU doesn''t seem to call region_add/region_del for ioport. > > > > Moreover, some of ioport are directly register without > > > > using memory hook (for example cirrus vga). > > > > > > > > What is the best way to do it ? > > > > > > I haven''t looked at details. Maybe it is just a combination of "use case > > > not yet considered, but can easily be added" and "need to switch legacy > > > code to new scheme". Then this still remains the better option than this > > > hook. Avi? > > > > Just the second - region_add/del will be called, but only for ioports > > registered via the MemoryRegion APIs. > > It looks like there are quite a few register_ioport_read/write left > around, especially in the following files: > > hw/acpi_piix4.c > hw/cirrus_vga.c > hw/serial.c > hw/pckbd.c > hw/pc.c > > I guess they should all be converted to memory_region_init_io, right?Right. -- error compiling committee.c: too many arguments to function
On 03/26/2012 12:02 PM, Avi Kivity wrote:> On 03/26/2012 01:01 PM, Stefano Stabellini wrote: > >> On Sun, 25 Mar 2012, Avi Kivity wrote: >> >>> On 03/23/2012 06:37 PM, Jan Kiszka wrote: >>> >>>> On 2012-03-23 16:08, Julien Grall wrote: >>>> >>>>> On 03/22/2012 05:44 PM, Jan Kiszka wrote: >>>>> >>>>>>> static void core_region_nop(MemoryListener *listener, >>>>>>> diff --git a/ioport.c b/ioport.c >>>>>>> index 78a3b89..073ed75 100644 >>>>>>> --- a/ioport.c >>>>>>> +++ b/ioport.c >>>>>>> @@ -28,6 +28,7 @@ >>>>>>> #include "ioport.h" >>>>>>> #include "trace.h" >>>>>>> #include "memory.h" >>>>>>> +#include "hw/xen.h" >>>>>>> >>>>>>> /***********************************************************/ >>>>>>> /* IO Port */ >>>>>>> @@ -155,6 +156,11 @@ int register_ioport_read(pio_addr_t start, int >>>>>>> length, int size, >>>>>>> i); >>>>>>> ioport_opaque[i] = opaque; >>>>>>> } >>>>>>> + >>>>>>> + if (xen_enabled()) { >>>>>>> + xen_map_iorange(start, length, 0); >>>>>>> + } >>>>>>> + >>>>>>> return 0; >>>>>>> } >>>>>>> >>>>>>> @@ -175,7 +181,13 @@ int register_ioport_write(pio_addr_t start, int >>>>>>> length, int size, >>>>>>> i); >>>>>>> ioport_opaque[i] = opaque; >>>>>>> } >>>>>>> + >>>>>>> + if (xen_enabled()) { >>>>>>> + xen_map_iorange(start, length, 0); >>>>>>> + } >>>>>>> + >>>>>>> return 0; >>>>>>> + >>>>>>> } >>>>>>> >>>>>>> static uint32_t ioport_readb_thunk(void *opaque, uint32_t addr) >>>>>>> @@ -260,6 +272,11 @@ void isa_unassign_ioport(pio_addr_t start, int >>>>>>> length) >>>>>>> ioport_destructor_table[start](ioport_opaque[start]); >>>>>>> ioport_destructor_table[start] = NULL; >>>>>>> } >>>>>>> + >>>>>>> + if (xen_enabled()) { >>>>>>> + xen_unmap_iorange(start, length, 0); >>>>>>> + } >>>>>>> + >>>>>>> for(i = start; i< start + length; i++) { >>>>>>> ioport_read_table[0][i] = NULL; >>>>>>> ioport_read_table[1][i] = NULL; >>>>>>> >>>>>>> >>>>>> memory_listener_register(xen_hooks, system_io)? >>>>>> >>>>>> >>>>> QEMU doesn''t seem to call region_add/region_del for ioport. >>>>> Moreover, some of ioport are directly register without >>>>> using memory hook (for example cirrus vga). >>>>> >>>>> What is the best way to do it ? >>>>> >>>> I haven''t looked at details. Maybe it is just a combination of "use case >>>> not yet considered, but can easily be added" and "need to switch legacy >>>> code to new scheme". Then this still remains the better option than this >>>> hook. Avi? >>>> >>> Just the second - region_add/del will be called, but only for ioports >>> registered via the MemoryRegion APIs. >>> >> It looks like there are quite a few register_ioport_read/write left >> around, especially in the following files: >> >> hw/acpi_piix4.c >> hw/cirrus_vga.c >> hw/serial.c >> hw/pckbd.c >> hw/pc.c >> >> I guess they should all be converted to memory_region_init_io, right? >> > RightI will modify theses files and send a different patch series.
Stefano Stabellini
2012-Mar-26 11:45 UTC
Re: [QEMU][RFC PATCH 4/6] xen-pci: Register PCI in Xen
On Sun, 25 Mar 2012, Avi Kivity wrote:> On 03/23/2012 01:02 PM, Stefano Stabellini wrote: > > Maybe the best thing to do is to have a set of machine specific options > > to select what devices need to be built in the machine. > > Most devices already can be dynamically selected: NICs, usb, acpi, > > cirrus, etc. > > We already have a Xen machine (xenfv_machine) that uses pc_init1 to > > initialize it. > > We could have a simple bitmask to determine what devices need to be > > enabled, then in pc_init1 we could have something like: > > > > if (devices & VGA_ENABLE) { > > pc_vga_init(); > > } > > > > Given the number of enable variable already present in the code > > (pci_enabled, acpi_enabled, usb_enabled, xen_enabled, > > cirrus_vga_enabled, ...), it might end up actually making the code more > > readable. The flexibility could end up being useful in the generic case > > as well, for testing if nothing else. > > > > We would still need to call xc_hvm_register_pcidev to register PCI > > devices in Xen, but we wouldn''t expect to fail unless there was a > > misconfiguration somewhere. In that case we could just exit. > > > > > > > > Now the problem is: there isn''t a simple way to specify the BDF where > > you want to create the device; pci_create_simple takes a devfn but most > > of the higher level functions (pc_vga_init, pci_nic_init_nofail, ...) > > don''t export the parameter at the moment. > > We would need to be able to tell pc_vga_init where to create the card, > > so we would have to export the devfn as a parameter. > > > > You already have total flexibility with the -device foo parameter. It > allows you to create any device, anywhere, with whatever configuration > you want. Use in conjunction with -nodefconfig.Thanks, -device looks exactly like what we need! However I think that the option to suppress the defaults is -nodefaults.> You may want your own host/pci bridge that lacks the device 0 > configuration space.In order not to disrupt the emulated machine in QEMU too much, I was thinking to let QEMU create the default device 0 and device 1: 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) and then have only the first QEMU register itself for IO events in Xen related to these devices. That means that only the first QEMU would actually receive any events to handle while the other QEMUs would never receive any events for these devices. Then everything else would go through -device: a device is created only if the command line option is passed and in that case QEMU also registers itself as the handler of this specific device in Xen. There is supposed to be no overlaps in the configuration, so if two QEMUs both register for the same device Xen would return error and QEMU would exit. The reason for doing this is that I am not sure that all OSes would be able to cope with the ISA bridge being at a location different than 00:01.0 or the IDE controller being on a different device from the ISA bridge, considering that they are supposed to be two functions of the same device (Intel PIIX southbridge). So at that point we might as well leave them as they are and try to disrupt the basic config at little as possible.
On 03/26/2012 01:45 PM, Stefano Stabellini wrote:> > > > > > > > > Now the problem is: there isn''t a simple way to specify the BDF where > > > you want to create the device; pci_create_simple takes a devfn but most > > > of the higher level functions (pc_vga_init, pci_nic_init_nofail, ...) > > > don''t export the parameter at the moment. > > > We would need to be able to tell pc_vga_init where to create the card, > > > so we would have to export the devfn as a parameter. > > > > > > > You already have total flexibility with the -device foo parameter. It > > allows you to create any device, anywhere, with whatever configuration > > you want. Use in conjunction with -nodefconfig. > > Thanks, -device looks exactly like what we need! > However I think that the option to suppress the defaults is -nodefaults.Correct, as I was just informed in another thread.> > > > You may want your own host/pci bridge that lacks the device 0 > > configuration space. > > In order not to disrupt the emulated machine in QEMU too much, I was > thinking to let QEMU create the default device 0 and device 1: > > 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) > 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] > 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] > 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) > > and then have only the first QEMU register itself for IO events in Xen > related to these devices. That means that only the first QEMU would > actually receive any events to handle while the other QEMUs would never > receive any events for these devices. > > Then everything else would go through -device: a device is created > only if the command line option is passed and in that case QEMU > also registers itself as the handler of this specific device in Xen. > > There is supposed to be no overlaps in the configuration, so if two > QEMUs both register for the same device Xen would return error and QEMU > would exit. > > > The reason for doing this is that I am not sure that all OSes would be > able to cope with the ISA bridge being at a location different than > 00:01.0 or the IDE controller being on a different device from the ISA > bridge, considering that they are supposed to be two functions of the > same device (Intel PIIX southbridge). > So at that point we might as well leave them as they are and try to > disrupt the basic config at little as possible.Yes, but won''t all qemus have those 00:01.0 devices and try to register for them? What about if two BARs (from different devices) are configured for the same address ranges? -- error compiling committee.c: too many arguments to function
Stefano Stabellini
2012-Mar-26 12:20 UTC
Re: [QEMU][RFC PATCH 4/6] xen-pci: Register PCI in Xen
On Mon, 26 Mar 2012, Avi Kivity wrote:> > > You may want your own host/pci bridge that lacks the device 0 > > > configuration space. > > > > In order not to disrupt the emulated machine in QEMU too much, I was > > thinking to let QEMU create the default device 0 and device 1: > > > > 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) > > 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] > > 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] > > 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) > > > > and then have only the first QEMU register itself for IO events in Xen > > related to these devices. That means that only the first QEMU would > > actually receive any events to handle while the other QEMUs would never > > receive any events for these devices. > > > > Then everything else would go through -device: a device is created > > only if the command line option is passed and in that case QEMU > > also registers itself as the handler of this specific device in Xen. > > > > There is supposed to be no overlaps in the configuration, so if two > > QEMUs both register for the same device Xen would return error and QEMU > > would exit. > > > > > > The reason for doing this is that I am not sure that all OSes would be > > able to cope with the ISA bridge being at a location different than > > 00:01.0 or the IDE controller being on a different device from the ISA > > bridge, considering that they are supposed to be two functions of the > > same device (Intel PIIX southbridge). > > So at that point we might as well leave them as they are and try to > > disrupt the basic config at little as possible. > > Yes, but won''t all qemus have those 00:01.0 devices and try to register > for them?Yes, this is where it becomes ugly again. One possibility would be to have a special option, maybe on xenstore, to tell QEMU "(do not)register North and South Bridge". In the register_device callback that we''ll have in xen-all.c, we''ll have a brief list of "special devices" that we might want to ignore even if they are being emulated. If this approach turns out to be too ugly from the code point of view, we might have to make 00:00.0 and 00:01.X configurable too.> What about if two BARs (from different devices) are configured for the > same address ranges?I think that it should have the same chance of happening as if there was just one QEMU, because from the guest OS and firmware POV the emulated hardware is the same. In any case Xen would return an error and QEMU can either exit or try to cope with it.
On 03/26/2012 02:20 PM, Stefano Stabellini wrote:> On Mon, 26 Mar 2012, Avi Kivity wrote: > > > > You may want your own host/pci bridge that lacks the device 0 > > > > configuration space. > > > > > > In order not to disrupt the emulated machine in QEMU too much, I was > > > thinking to let QEMU create the default device 0 and device 1: > > > > > > 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) > > > 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] > > > 00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] > > > 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) > > > > > > and then have only the first QEMU register itself for IO events in Xen > > > related to these devices. That means that only the first QEMU would > > > actually receive any events to handle while the other QEMUs would never > > > receive any events for these devices. > > > > > > Then everything else would go through -device: a device is created > > > only if the command line option is passed and in that case QEMU > > > also registers itself as the handler of this specific device in Xen. > > > > > > There is supposed to be no overlaps in the configuration, so if two > > > QEMUs both register for the same device Xen would return error and QEMU > > > would exit. > > > > > > > > > The reason for doing this is that I am not sure that all OSes would be > > > able to cope with the ISA bridge being at a location different than > > > 00:01.0 or the IDE controller being on a different device from the ISA > > > bridge, considering that they are supposed to be two functions of the > > > same device (Intel PIIX southbridge). > > > So at that point we might as well leave them as they are and try to > > > disrupt the basic config at little as possible. > > > > Yes, but won''t all qemus have those 00:01.0 devices and try to register > > for them? > > Yes, this is where it becomes ugly again. > > One possibility would be to have a special option, maybe on xenstore, to > tell QEMU "(do not)register North and South Bridge". > In the register_device callback that we''ll have in xen-all.c, we''ll have > a brief list of "special devices" that we might want to ignore even if > they are being emulated. > If this approach turns out to be too ugly from the code point of view, > we might have to make 00:00.0 and 00:01.X configurable too. > > > > What about if two BARs (from different devices) are configured for the > > same address ranges? > > I think that it should have the same chance of happening as if there was > just one QEMU, because from the guest OS and firmware POV the emulated > hardware is the same. In any case Xen would return an error and QEMU can > either exit or try to cope with it.How can qemu cope? In a normal situation it''s aware of all the devices, here it''s not aware of any device (except the one it is managing). You''re trying to convert a hierarchical problem into a flat one with no communication. What happens if one of the devices is a PCI-PCI bridge and it turns off its PCI window? The devices behind it should no longer respond, yet they know nothing about it. I think you need to preserve the hierarchy. The host-pci bridge needs to talk to devices behind it, (as does a pci-pci bridge). -- error compiling committee.c: too many arguments to function
On 03/26/2012 01:24 PM, Julien Grall wrote:>>> It looks like there are quite a few register_ioport_read/write left >>> around, especially in the following files: >>> >>> hw/acpi_piix4.c >>> hw/cirrus_vga.c >>> hw/serial.c >>> hw/pckbd.c >>> hw/pc.c >>> >>> I guess they should all be converted to memory_region_init_io, right? >>> >> Right > I will modify theses files and send a different patch series. >Great. Please post them as a separate series, they can go in relatively quickly since they should be mostly straighforward. -- error compiling committee.c: too many arguments to function
Stefano Stabellini
2012-Mar-26 13:56 UTC
Re: [QEMU][RFC PATCH 4/6] xen-pci: Register PCI in Xen
On Mon, 26 Mar 2012, Avi Kivity wrote:> > > What about if two BARs (from different devices) are configured for the > > > same address ranges? > > > > I think that it should have the same chance of happening as if there was > > just one QEMU, because from the guest OS and firmware POV the emulated > > hardware is the same. In any case Xen would return an error and QEMU can > > either exit or try to cope with it. > > How can qemu cope? In a normal situation it''s aware of all the devices, > here it''s not aware of any device (except the one it is managing). > > You''re trying to convert a hierarchical problem into a flat one with no > communication. What happens if one of the devices is a PCI-PCI bridge > and it turns off its PCI window? The devices behind it should no longer > respond, yet they know nothing about it. > > I think you need to preserve the hierarchy. The host-pci bridge needs > to talk to devices behind it, (as does a pci-pci bridge).PCI bridges are a problem, in fact I was thinking to just assume that there are no PCI Bridges with PCI devices behind them other than the Host PCI Bridge for now. There is going to be a need for some kind of communication between the multiple QEMU instances, for example for ACPI power state changes and PCI Bridges, like you wrote. So I think that at some point we''ll need to introduce a type of message that originates from one of the QEMU instances (the first one especially) and goes to all the others. It should probably be relayed by the hypervisor.