Julien Grall
2012-Aug-22 12:31 UTC
[XEN][RFC PATCH V2 00/17] QEMU disaggregation in Xen environment
Hello, This patch series only concerns Xen. Another serie will come for QEMU. I''m currently working on QEMU disaggregation in Xen environment. The goal is to be able to running multiple QEMU for a same domain (http://lists.xen.org/archives/html/xen-devel/2012-03/msg00299.html). I have already sent a version of patch series few months ago: - QEMU: https://lists.gnu.org/archive/html/qemu-devel/2012-03/msg04401.html - Xen: http://lists.xen.org/archives/html/xen-devel/2012-03/msg01947.html With the different feedbacks, I have improved both QEMU and Xen modifications. As before, I will sent two patch series, one for QEMU the other for Xen. Full disaggregation is not possible (one device = one QEMU) because lots of device depends on each others. With the help of Stefano, I have defined as possible disaggregation: - ui: emulate default devices (root bridge, south bridge), VGA, keyboard, mouse and USB - audio: emulate audio - ide: emulate disks - serial: emulate serial port - net: it possible to have multiple QEMU that emulates one or more network card Of course, a same QEMU can emulate both ui and audio. Old configuration file with qemu-xen still works. The patch series adds an option "device_models". Example: builder=''hvm'' memory = 1024 name = "Debian" vcpus=1 vif = [ ''type=ioemu, bridge=eth0, mac=00:16:3e:0e:f5:ef, id=nic1'' ] disk = [ ''tap:tapdisk:qcow2:/home/xentest/works/vms/debian.img,xvda,w'' ] device_model_override = ''/home/xentest/works/qemu-devel/qemu-wrapper'' device_model_version = ''qemu-xen'' device_models = [ ''name=qnet,vifs=nic1'', ''name=qall,ui,ide'' ] It possible to override device model path for each device model. It could be useful for debugging. For instance, ''name=qnet,vifs=nic1,path=/my/path/wrapper''. The option "name" is used for logging filename or debugging, if it''s not specify, a number is used. Modifications between V1 and V2: - rewrite libxl patch according to the new API - improve user experience with configuration file (avoid to specify bdf) - improve PCI hypercall: use bus, domain, device, function instead of bdf. - fix PCI config space handler - remove unused HVM paramaters - handle save/restore Drawbacks: - PCI hotplug doesn''t works - stubdomain doesn''t works because old QEMU is not modify for disaggregation. By the way it''s works on XenClient stubdomain - Which QEMU need to emulate Xen Platform ? It''s mainly used to unplug network cards and disks Possible improvements: - Like hvm get parameters, introduce an hypercall to retrieve shared pages. For the moment the server id is used - Specify if we want buffered I/O shared page or not (It was an idea of Christian Limpach) I don''t test all configurations. Comments, bug reports, ... are welcome. Julien Grall (17): hvm: Modify interface to support multiple ioreq server hvm: Add functions to handle ioreq servers hvm-pci: Handle PCI config space in Xen hvm: Change initialization/destruction of an hvm hvm: Modify hvm_op hvm-io: IO refactoring with ioreq server hvm-io: send invalidate map cache to each registered servers hvm-io: Handle server in buffered IO xc: Add the hypercall for multiple servers xc: Add argument to allocate more special pages xc: modify save/restore to support multiple device models xl: Add interface to handle qemu disaggregation xl: add device model id to qmp functions xl-parsing: Parse new device_models option xl: support spawn/destroy on multiple device model xl: Fix PCI library xl: implement save/restore for multiple device models tools/libxc/xc_domain.c | 155 ++++++++++ tools/libxc/xc_domain_restore.c | 150 ++++++++--- tools/libxc/xc_domain_save.c | 6 +- tools/libxc/xc_hvm_build_x86.c | 59 ++-- tools/libxc/xenctrl.h | 21 ++ tools/libxc/xenguest.h | 4 +- tools/libxl/Makefile | 2 +- tools/libxl/libxl.c | 21 +- tools/libxl/libxl.h | 3 + tools/libxl/libxl_create.c | 150 ++++++++--- tools/libxl/libxl_device.c | 7 +- tools/libxl/libxl_dm.c | 369 +++++++++++++++++------- tools/libxl/libxl_dom.c | 147 ++++++++-- tools/libxl/libxl_internal.h | 76 ++++-- tools/libxl/libxl_pci.c | 19 +- tools/libxl/libxl_qmp.c | 49 ++-- tools/libxl/libxl_types.idl | 15 + tools/libxl/libxlu_dm.c | 96 +++++++ tools/libxl/libxlutil.h | 5 + tools/libxl/xl_cmdimpl.c | 29 ++- tools/python/xen/lowlevel/xc/xc.c | 3 +- xen/arch/x86/hvm/Makefile | 1 + xen/arch/x86/hvm/emulate.c | 56 ++++ xen/arch/x86/hvm/hvm.c | 567 +++++++++++++++++++++++++++++++------ xen/arch/x86/hvm/io.c | 90 +++++-- xen/arch/x86/hvm/pci_emul.c | 168 +++++++++++ xen/include/asm-x86/hvm/domain.h | 25 ++- xen/include/asm-x86/hvm/support.h | 26 ++- xen/include/asm-x86/hvm/vcpu.h | 4 +- xen/include/public/hvm/hvm_op.h | 51 ++++ xen/include/public/hvm/ioreq.h | 1 + xen/include/public/hvm/params.h | 11 +- xen/include/public/xen.h | 1 + xen/include/xen/hvm/pci_emul.h | 29 ++ 34 files changed, 1986 insertions(+), 430 deletions(-) create mode 100644 tools/libxl/libxlu_dm.c create mode 100644 xen/arch/x86/hvm/pci_emul.c create mode 100644 xen/include/xen/hvm/pci_emul.h -- Julien Grall
Julien Grall
2012-Aug-22 12:31 UTC
[XEN][RFC PATCH V2 01/17] hvm: Modify interface to support multiple ioreq server
Add structure to handle ioreq server. It''s a server which can handle a range of IO (MMIO and/or PIO) and emulate a PCI device. Each server has its own shared page to receive ioreq. So we have introduced to HVM PARAM to set/get the first and the last shared page used for ioreq. With this id, the server is able to retrieve its page. Introduce a new kind of ioreq type IOREQ_TYPE_PCICONFIG which permits to forward easily PCI config space access. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- xen/include/asm-x86/hvm/domain.h | 25 ++++++++++++++++++- xen/include/asm-x86/hvm/vcpu.h | 4 ++- xen/include/public/hvm/hvm_op.h | 51 ++++++++++++++++++++++++++++++++++++++ xen/include/public/hvm/ioreq.h | 1 + xen/include/public/hvm/params.h | 6 +++- xen/include/public/xen.h | 1 + xen/include/xen/hvm/pci_emul.h | 29 +++++++++++++++++++++ 7 files changed, 114 insertions(+), 3 deletions(-) create mode 100644 xen/include/xen/hvm/pci_emul.h diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h index 27b3de5..49d1ca0 100644 --- a/xen/include/asm-x86/hvm/domain.h +++ b/xen/include/asm-x86/hvm/domain.h @@ -28,6 +28,7 @@ #include <asm/hvm/vioapic.h> #include <asm/hvm/io.h> #include <xen/hvm/iommu.h> +#include <xen/hvm/pci_emul.h> #include <asm/hvm/viridian.h> #include <asm/hvm/vmx/vmcs.h> #include <asm/hvm/svm/vmcb.h> @@ -41,14 +42,36 @@ struct hvm_ioreq_page { void *va; }; +struct hvm_io_range { + uint64_t s, e; + struct hvm_io_range *next; +}; + +struct hvm_ioreq_server { + unsigned int id; + domid_t domid; + struct hvm_io_range *mmio_range_list; + struct hvm_io_range *portio_range_list; + struct hvm_ioreq_server *next; + struct hvm_ioreq_page ioreq; + struct hvm_ioreq_page buf_ioreq; + unsigned int buf_ioreq_evtchn; +}; + struct hvm_domain { + /* Use for the IO handles by Xen */ struct hvm_ioreq_page ioreq; - struct hvm_ioreq_page buf_ioreq; + struct hvm_ioreq_server *ioreq_server_list; + uint32_t nr_ioreq_server; + spinlock_t ioreq_server_lock; struct pl_time pl_time; struct hvm_io_handler *io_handler; + /* PCI Information */ + struct pci_root_emul pci_root; + /* Lock protects access to irq, vpic and vioapic. */ spinlock_t irq_lock; struct hvm_irq irq; diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h index 9d68ed2..812b16e 100644 --- a/xen/include/asm-x86/hvm/vcpu.h +++ b/xen/include/asm-x86/hvm/vcpu.h @@ -125,7 +125,9 @@ struct hvm_vcpu { spinlock_t tm_lock; struct list_head tm_list; - int xen_port; + struct hvm_ioreq_page *ioreq; + /* PCI Information */ + uint32_t pci_cf8; bool_t flag_dr_dirty; bool_t debug_state_latch; diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h index a9aab4b..6b17c5f 100644 --- a/xen/include/public/hvm/hvm_op.h +++ b/xen/include/public/hvm/hvm_op.h @@ -23,6 +23,9 @@ #include "../xen.h" #include "../trace.h" +#include "../event_channel.h" + +#include "hvm_info_table.h" /* HVM_MAX_VCPUS */ /* Get/set subcommands: extra argument == pointer to xen_hvm_param struct. */ #define HVMOP_set_param 0 @@ -238,6 +241,54 @@ struct xen_hvm_inject_trap { typedef struct xen_hvm_inject_trap xen_hvm_inject_trap_t; DEFINE_XEN_GUEST_HANDLE(xen_hvm_inject_trap_t); +#define HVMOP_register_ioreq_server 20 +struct xen_hvm_register_ioreq_server { + domid_t domid; /* IN - domain to be serviced */ + ioservid_t id; /* OUT - handle for identifying this server */ +}; +typedef struct xen_hvm_register_ioreq_server xen_hvm_register_ioreq_server_t; +DEFINE_XEN_GUEST_HANDLE(xen_hvm_register_ioreq_server_t); + +#define HVMOP_get_ioreq_server_buf_channel 21 +struct xen_hvm_get_ioreq_server_buf_channel { + domid_t domid; /* IN - domain to be serviced */ + ioservid_t id; /* IN - handle from HVMOP_register_ioreq_server */ + evtchn_port_t channel; /* OUT - buf ioreq channel */ +}; +typedef struct xen_hvm_get_ioreq_server_buf_channel xen_hvm_get_ioreq_server_buf_channel_t; +DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_ioreq_server_buf_channel_t); + +#define HVMOP_map_io_range_to_ioreq_server 22 +struct xen_hvm_map_io_range_to_ioreq_server { + domid_t domid; /* IN - domain to be serviced */ + int is_mmio; /* IN - MMIO or port IO? */ + ioservid_t id; /* IN - handle from HVMOP_register_ioreq_server */ + uint64_aligned_t s, e; /* IN - inclusive start and end of range */ +}; +typedef struct xen_hvm_map_io_range_to_ioreq_server xen_hvm_map_io_range_to_ioreq_server_t; +DEFINE_XEN_GUEST_HANDLE(xen_hvm_map_io_range_to_ioreq_server_t); + +#define HVMOP_unmap_io_range_from_ioreq_server 23 +struct xen_hvm_unmap_io_range_from_ioreq_server { + domid_t domid; /* IN - domain to be serviced */ + uint8_t is_mmio; /* IN - MMIO or port IO? */ + ioservid_t id; /* IN - handle from HVMOP_register_ioreq_server */ + uint64_aligned_t addr; /* IN - address inside the range to remove */ +}; +typedef struct xen_hvm_unmap_io_range_from_ioreq_server xen_hvm_unmap_io_range_from_ioreq_server_t; +DEFINE_XEN_GUEST_HANDLE(xen_hvm_unmap_io_range_from_ioreq_server_t); + +#define HVMOP_register_pcidev 24 +struct xen_hvm_register_pcidev { + domid_t domid; /* IN - domain to be serviced */ + ioservid_t id; /* IN - handle from HVMOP_register_ioreq_server */ + /* IN - PCI identification in PCI topology (domain:bus:device:function) */ + uint8_t domain, bus, device, function; +}; +typedef struct xen_hvm_register_pcidev xen_hvm_register_pcidev_t; +DEFINE_XEN_GUEST_HANDLE(xen_hvm_register_pcidev_t); + + #endif /* defined(__XEN__) || defined(__XEN_TOOLS__) */ #define HVMOP_get_mem_type 15 diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h index 4022a1d..87aacd3 100644 --- a/xen/include/public/hvm/ioreq.h +++ b/xen/include/public/hvm/ioreq.h @@ -34,6 +34,7 @@ #define IOREQ_TYPE_PIO 0 /* pio */ #define IOREQ_TYPE_COPY 1 /* mmio ops */ +#define IOREQ_TYPE_PCI_CONFIG 2 /* pci config space ops */ #define IOREQ_TYPE_TIMEOFFSET 7 #define IOREQ_TYPE_INVALIDATE 8 /* mapcache */ diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h index 55c1b57..309ac1b 100644 --- a/xen/include/public/hvm/params.h +++ b/xen/include/public/hvm/params.h @@ -147,6 +147,10 @@ #define HVM_PARAM_ACCESS_RING_PFN 28 #define HVM_PARAM_SHARING_RING_PFN 29 -#define HVM_NR_PARAMS 30 +/* Param for ioreq servers */ +#define HVM_PARAM_IO_PFN_FIRST 30 +#define HVM_PARAM_IO_PFN_LAST 31 + +#define HVM_NR_PARAMS 32 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */ diff --git a/xen/include/public/xen.h b/xen/include/public/xen.h index b2f6c50..0de17b2 100644 --- a/xen/include/public/xen.h +++ b/xen/include/public/xen.h @@ -466,6 +466,7 @@ DEFINE_XEN_GUEST_HANDLE(mmuext_op_t); #ifndef __ASSEMBLY__ typedef uint16_t domid_t; +typedef uint32_t ioservid_t; /* Domain ids >= DOMID_FIRST_RESERVED cannot be used for ordinary domains. */ #define DOMID_FIRST_RESERVED (0x7FF0U) diff --git a/xen/include/xen/hvm/pci_emul.h b/xen/include/xen/hvm/pci_emul.h new file mode 100644 index 0000000..4dfb577 --- /dev/null +++ b/xen/include/xen/hvm/pci_emul.h @@ -0,0 +1,29 @@ +#ifndef PCI_EMUL_H_ +# define PCI_EMUL_H_ + +# include <xen/radix-tree.h> +# include <xen/spinlock.h> +# include <xen/types.h> + +void hvm_init_pci_emul(struct domain *d); +void hvm_destroy_pci_emul(struct domain *d); +int hvm_register_pcidev(domid_t domid, ioservid_t id, + uint8_t domain, uint8_t bus, + uint8_t device, uint8_t function); + +struct pci_root_emul { + spinlock_t pci_lock; + struct radix_tree_root pci_list; +}; + +#endif /* !PCI_EMUL_H_ */ + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ -- Julien Grall
Julien Grall
2012-Aug-22 12:31 UTC
[XEN][RFC PATCH V2 02/17] hvm: Add functions to handle ioreq servers
This patch adds functions to : - create/destroy server - map/unmap IO range to a server Signed-off-by: Julien Grall <julien.grall@citrix.com> --- xen/arch/x86/hvm/hvm.c | 356 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 356 insertions(+), 0 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 7f8a025c..687e480 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -354,6 +354,37 @@ void hvm_do_resume(struct vcpu *v) } } +static void hvm_init_ioreq_servers(struct domain *d) +{ + spin_lock_init(&d->arch.hvm_domain.ioreq_server_lock); + d->arch.hvm_domain.nr_ioreq_server = 0; +} + +static int hvm_ioreq_servers_new_vcpu(struct vcpu *v) +{ + struct hvm_ioreq_server *s; + struct domain *d = v->domain; + shared_iopage_t *p; + int rc = 0; + + spin_lock(&d->arch.hvm_domain.ioreq_server_lock); + + for ( s = d->arch.hvm_domain.ioreq_server_list; s != NULL; s = s->next ) + { + p = s->ioreq.va; + ASSERT(p != NULL); + + rc = alloc_unbound_xen_event_channel(v, s->domid, NULL); + if ( rc < 0 ) + break; + p->vcpu_ioreq[v->vcpu_id].vp_eport = rc; + } + + spin_unlock(&d->arch.hvm_domain.ioreq_server_lock); + + return (rc < 0) ? rc : 0; +} + static void hvm_init_ioreq_page( struct domain *d, struct hvm_ioreq_page *iorp) { @@ -559,6 +590,59 @@ int hvm_domain_initialise(struct domain *d) return rc; } +static void hvm_destroy_ioreq_server(struct domain *d, + struct hvm_ioreq_server *s) +{ + struct hvm_io_range *x; + shared_iopage_t *p; + int i; + + while ( (x = s->mmio_range_list) != NULL ) + { + s->mmio_range_list = x->next; + xfree(x); + } + while ( (x = s->portio_range_list) != NULL ) + { + s->portio_range_list = x->next; + xfree(x); + } + + p = s->ioreq.va; + + for ( i = 0; i < MAX_HVM_VCPUS; i++ ) + { + if ( p->vcpu_ioreq[i].vp_eport ) + { + free_xen_event_channel(d->vcpu[i], p->vcpu_ioreq[i].vp_eport); + } + } + + free_xen_event_channel(d->vcpu[0], s->buf_ioreq_evtchn); + + hvm_destroy_ioreq_page(d, &s->ioreq); + hvm_destroy_ioreq_page(d, &s->buf_ioreq); + + xfree(s); +} + +static void hvm_destroy_ioreq_servers(struct domain *d) +{ + struct hvm_ioreq_server *s; + + spin_lock(&d->arch.hvm_domain.ioreq_server_lock); + + ASSERT(d->is_dying); + + while ( (s = d->arch.hvm_domain.ioreq_server_list) != NULL ) + { + d->arch.hvm_domain.ioreq_server_list = s->next; + hvm_destroy_ioreq_server(d, s); + } + + spin_unlock(&d->arch.hvm_domain.ioreq_server_lock); +} + void hvm_domain_relinquish_resources(struct domain *d) { hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.ioreq); @@ -3686,6 +3770,278 @@ static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid, return 0; } +static int hvm_alloc_ioreq_server_page(struct domain *d, + struct hvm_ioreq_server *s, + struct hvm_ioreq_page *pfn, + int i) +{ + int rc = 0; + unsigned long gmfn; + + if (i < 0 || i > 1) + return -EINVAL; + + hvm_init_ioreq_page(d, pfn); + + gmfn = d->arch.hvm_domain.params[HVM_PARAM_IO_PFN_FIRST] + + (s->id - 1) * 2 + i + 1; + + if (gmfn > d->arch.hvm_domain.params[HVM_PARAM_IO_PFN_LAST]) + return -EINVAL; + + rc = hvm_set_ioreq_page(d, pfn, gmfn); + + if (!rc && pfn->va == NULL) + rc = -ENOMEM; + + return rc; +} + +static int hvmop_register_ioreq_server( + struct xen_hvm_register_ioreq_server *a) +{ + struct hvm_ioreq_server *s, **pp; + struct domain *d; + shared_iopage_t *p; + struct vcpu *v; + int i; + int rc = 0; + + if ( current->domain->domain_id == a->domid ) + return -EINVAL; + + rc = rcu_lock_target_domain_by_id(a->domid, &d); + if ( rc != 0 ) + return rc; + + if ( !is_hvm_domain(d) ) + { + rcu_unlock_domain(d); + return -EINVAL; + } + + s = xmalloc(struct hvm_ioreq_server); + if ( s == NULL ) + { + rcu_unlock_domain(d); + return -ENOMEM; + } + memset(s, 0, sizeof(*s)); + + if ( d->is_dying) + { + rc = -EINVAL; + goto register_died; + } + + spin_lock(&d->arch.hvm_domain.ioreq_server_lock); + + s->id = d->arch.hvm_domain.nr_ioreq_server + 1; + s->domid = current->domain->domain_id; + + /* Initialize shared pages */ + if ( (rc = hvm_alloc_ioreq_server_page(d, s, &s->ioreq, 0)) ) + goto register_ioreq; + if ( (rc = hvm_alloc_ioreq_server_page(d, s, &s->buf_ioreq, 1)) ) + goto register_buf_ioreq; + + p = s->ioreq.va; + + for_each_vcpu ( d, v ) + { + rc = alloc_unbound_xen_event_channel(v, s->domid, NULL); + if ( rc < 0 ) + goto register_ports; + p->vcpu_ioreq[v->vcpu_id].vp_eport = rc; + } + + /* Allocate buffer event channel */ + rc = alloc_unbound_xen_event_channel(d->vcpu[0], s->domid, NULL); + + if (rc < 0) + goto register_ports; + s->buf_ioreq_evtchn = rc; + + pp = &d->arch.hvm_domain.ioreq_server_list; + while ( *pp != NULL ) + pp = &(*pp)->next; + *pp = s; + + d->arch.hvm_domain.nr_ioreq_server += 1; + a->id = s->id; + + spin_unlock(&d->arch.hvm_domain.ioreq_server_lock); + rcu_unlock_domain(d); + + goto register_done; + +register_ports: + p = s->ioreq.va; + for ( i = 0; i < MAX_HVM_VCPUS; i++ ) + { + if ( p->vcpu_ioreq[i].vp_eport ) + free_xen_event_channel(d->vcpu[i], p->vcpu_ioreq[i].vp_eport); + } + hvm_destroy_ioreq_page(d, &s->buf_ioreq); +register_buf_ioreq: + hvm_destroy_ioreq_page(d, &s->ioreq); +register_ioreq: + spin_unlock(&d->arch.hvm_domain.ioreq_server_lock); +register_died: + xfree(s); + rcu_unlock_domain(d); +register_done: + return 0; +} + +static int hvmop_get_ioreq_server_buf_channel( + struct xen_hvm_get_ioreq_server_buf_channel *a) +{ + struct domain *d; + struct hvm_ioreq_server *s; + int rc; + + rc = rcu_lock_target_domain_by_id(a->domid, &d); + + if ( rc != 0 ) + return rc; + + if ( !is_hvm_domain(d) ) + { + rcu_unlock_domain(d); + return -EINVAL; + } + + spin_lock(&d->arch.hvm_domain.ioreq_server_lock); + s = d->arch.hvm_domain.ioreq_server_list; + + while ( (s != NULL) && (s->id != a->id) ) + s = s->next; + + if ( s == NULL ) + { + spin_unlock(&d->arch.hvm_domain.ioreq_server_lock); + rcu_unlock_domain(d); + return -ENOENT; + } + + a->channel = s->buf_ioreq_evtchn; + + spin_unlock(&d->arch.hvm_domain.ioreq_server_lock); + rcu_unlock_domain(d); + + return 0; +} + +static int hvmop_map_io_range_to_ioreq_server( + struct xen_hvm_map_io_range_to_ioreq_server *a) +{ + struct hvm_ioreq_server *s; + struct hvm_io_range *x; + struct domain *d; + int rc; + + rc = rcu_lock_target_domain_by_id(a->domid, &d); + if ( rc != 0 ) + return rc; + + if ( !is_hvm_domain(d) ) + { + rcu_unlock_domain(d); + return -EINVAL; + } + + spin_lock(&d->arch.hvm_domain.ioreq_server_lock); + + x = xmalloc(struct hvm_io_range); + s = d->arch.hvm_domain.ioreq_server_list; + while ( (s != NULL) && (s->id != a->id) ) + s = s->next; + if ( (s == NULL) || (x == NULL) ) + { + xfree(x); + spin_unlock(&d->arch.hvm_domain.ioreq_server_lock); + rcu_unlock_domain(d); + return x ? -ENOENT : -ENOMEM; + } + + x->s = a->s; + x->e = a->e; + if ( a->is_mmio ) + { + x->next = s->mmio_range_list; + s->mmio_range_list = x; + } + else + { + x->next = s->portio_range_list; + s->portio_range_list = x; + } + + spin_unlock(&d->arch.hvm_domain.ioreq_server_lock); + rcu_unlock_domain(d); + return 0; +} + +static int hvmop_unmap_io_range_from_ioreq_server( + struct xen_hvm_unmap_io_range_from_ioreq_server *a) +{ + struct hvm_ioreq_server *s; + struct hvm_io_range *x, **xp; + struct domain *d; + int rc; + + rc = rcu_lock_target_domain_by_id(a->domid, &d); + if ( rc != 0 ) + return rc; + + if ( !is_hvm_domain(d) ) + { + rcu_unlock_domain(d); + return -EINVAL; + } + + spin_lock(&d->arch.hvm_domain.ioreq_server_lock); + + s = d->arch.hvm_domain.ioreq_server_list; + while ( (s != NULL) && (s->id != a->id) ) + s = s->next; + if ( (s == NULL) ) + { + spin_unlock(&d->arch.hvm_domain.ioreq_server_lock); + rcu_unlock_domain(d); + return -ENOENT; + } + + if ( a->is_mmio ) + { + x = s->mmio_range_list; + xp = &s->mmio_range_list; + } + else + { + x = s->portio_range_list; + xp = &s->portio_range_list; + } + while ( (x != NULL) && (a->addr < x->s || a->addr > x->e) ) + { + xp = &x->next; + x = x->next; + } + if ( (x != NULL) ) + { + *xp = x->next; + xfree(x); + rc = 0; + } + else + rc = -ENOENT; + + spin_unlock(&d->arch.hvm_domain.ioreq_server_lock); + rcu_unlock_domain(d); + return rc; +} + long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg) { -- Julien Grall
Julien Grall
2012-Aug-22 12:31 UTC
[XEN][RFC PATCH V2 03/17] hvm-pci: Handle PCI config space in Xen
Add function to register a bdf within a server. An handler was add to catch (cf8 -> cff) ioport access. When Xen reveices a PIO for cf8, it''s store the value inside the current vcpu until it receives a PIO for cfc -> cff. In this case, it checks if the bdf is registered and forge the ioreq that will be forward to server later. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- xen/arch/x86/hvm/Makefile | 1 + xen/arch/x86/hvm/pci_emul.c | 168 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 169 insertions(+), 0 deletions(-) create mode 100644 xen/arch/x86/hvm/pci_emul.c diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile index eea5555..585e9c9 100644 --- a/xen/arch/x86/hvm/Makefile +++ b/xen/arch/x86/hvm/Makefile @@ -12,6 +12,7 @@ obj-y += irq.o obj-y += mtrr.o obj-y += nestedhvm.o obj-y += pmtimer.o +obj-y += pci_emul.o obj-y += quirks.o obj-y += rtc.o obj-y += save.o diff --git a/xen/arch/x86/hvm/pci_emul.c b/xen/arch/x86/hvm/pci_emul.c new file mode 100644 index 0000000..48456dd --- /dev/null +++ b/xen/arch/x86/hvm/pci_emul.c @@ -0,0 +1,168 @@ +#include <asm/hvm/support.h> +#include <xen/hvm/pci_emul.h> +#include <xen/pci.h> +#include <xen/sched.h> +#include <xen/xmalloc.h> + +#define PCI_DEBUGSTR "%x:%x.%x" +#define PCI_DEBUG(bdf) ((bdf) >> 8) & 0xff, ((bdf) >> 3) & 0x1f, ((bdf)) & 0x7 +#define PCI_MASK_BDF(bdf) (((bdf) & 0x00ffff00) >> 8) +#define PCI_CMP_BDF(Pci, Bdf) ((pci)->bdf == PCI_MASK_BDF(Bdf)) + +static int handle_config_space(int dir, uint32_t port, uint32_t bytes, + uint32_t *val) +{ + uint32_t pci_cf8; + struct hvm_ioreq_server *s; + ioreq_t *p = get_ioreq(current); + int rc = X86EMUL_UNHANDLEABLE; + struct vcpu *v = current; + + if ( port == 0xcf8 && bytes == 4 ) + { + if ( dir == IOREQ_READ ) + *val = v->arch.hvm_vcpu.pci_cf8; + else + v->arch.hvm_vcpu.pci_cf8 = *val; + return X86EMUL_OKAY; + } + else if ( port < 0xcfc ) + return X86EMUL_UNHANDLEABLE; + + spin_lock(&v->domain->arch.hvm_domain.pci_root.pci_lock); + spin_lock(&v->domain->arch.hvm_domain.ioreq_server_lock); + + pci_cf8 = v->arch.hvm_vcpu.pci_cf8; + + /* Retrieve PCI */ + s = radix_tree_lookup(&v->domain->arch.hvm_domain.pci_root.pci_list, + PCI_MASK_BDF(pci_cf8)); + + if ( unlikely(s == NULL) ) + { + *val = ~0; + rc = X86EMUL_OKAY; + goto end_handle; + } + + /** + * We just fill the ioreq, hvm_send_assist_req will send the request + * The size is used to find the right access + **/ + /* We use the 16 high-bits for the offset (0 => 0xcfc, 1 => 0xcfd...) */ + p->size = (p->addr - 0xcfc) << 16 | (p->size & 0xffff); + p->type = IOREQ_TYPE_PCI_CONFIG; + p->addr = pci_cf8; + + set_ioreq(v, &s->ioreq, p); + +end_handle: + spin_unlock(&v->domain->arch.hvm_domain.ioreq_server_lock); + spin_unlock(&v->domain->arch.hvm_domain.pci_root.pci_lock); + + return rc; +} + +int hvm_register_pcidev(domid_t domid, ioservid_t id, + uint8_t domain, uint8_t bus, + uint8_t device, uint8_t function) +{ + struct domain *d; + struct hvm_ioreq_server *s; + int rc = 0; + struct radix_tree_root *tree; + uint16_t bdf = 0; + + /* For the moment we don''t handle pci when domain != 0 */ + if ( domain != 0 ) + return -EINVAL; + + rc = rcu_lock_target_domain_by_id(domid, &d); + + if ( rc != 0 ) + return rc; + + if ( !is_hvm_domain(d) ) + { + rcu_unlock_domain(d); + return -EINVAL; + } + + /* Search server */ + spin_lock(&d->arch.hvm_domain.ioreq_server_lock); + s = d->arch.hvm_domain.ioreq_server_list; + while ( (s != NULL) && (s->id != id) ) + s = s->next; + + spin_unlock(&d->arch.hvm_domain.ioreq_server_lock); + + if ( s == NULL ) + { + gdprintk(XENLOG_ERR, "Cannot find server %u\n", id); + rc = -ENOENT; + goto fail; + } + + spin_lock(&d->arch.hvm_domain.pci_root.pci_lock); + + tree = &d->arch.hvm_domain.pci_root.pci_list; + + bdf |= ((uint16_t)bus) << 8; + bdf |= ((uint16_t)device & 0x1f) << 3; + bdf |= ((uint16_t)function & 0x7); + + if ( radix_tree_lookup(tree, bdf) ) + { + rc = -EEXIST; + gdprintk(XENLOG_ERR, "Bdf " PCI_DEBUGSTR " is already allocated\n", + PCI_DEBUG(bdf)); + goto create_end; + } + + rc = radix_tree_insert(tree, bdf, s); + if ( rc ) + { + gdprintk(XENLOG_ERR, "Cannot insert the bdf\n"); + goto create_end; + } + +create_end: + spin_unlock(&d->arch.hvm_domain.pci_root.pci_lock); +fail: + rcu_unlock_domain(d); + + return rc; +} + +void hvm_init_pci_emul(struct domain *d) +{ + struct pci_root_emul *root = &d->arch.hvm_domain.pci_root; + + spin_lock_init(&root->pci_lock); + + radix_tree_init(&root->pci_list); + + /* Register the config space handler */ + register_portio_handler(d, 0xcf8, 8, handle_config_space); +} + +void hvm_destroy_pci_emul(struct domain *d) +{ + struct pci_root_emul *root = &d->arch.hvm_domain.pci_root; + + spin_lock(&root->pci_lock); + + radix_tree_destroy(&root->pci_list, NULL); + + spin_unlock(&root->pci_lock); +} + +/* + * Local variables: + * mode: C + * c-set-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ -- Julien Grall
Julien Grall
2012-Aug-22 12:31 UTC
[XEN][RFC PATCH V2 04/17] hvm: Change initialization/destruction of an hvm
Prepare/Release structure for multiple ioreq servers. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- xen/arch/x86/hvm/hvm.c | 33 ++++++++++----------------------- 1 files changed, 10 insertions(+), 23 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 687e480..292d57b 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -567,10 +567,13 @@ int hvm_domain_initialise(struct domain *d) rtc_init(d); hvm_init_ioreq_page(d, &d->arch.hvm_domain.ioreq); - hvm_init_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq); + hvm_init_ioreq_servers(d); register_portio_handler(d, 0xe9, 1, hvm_print_line); + if ( hvm_init_pci_emul(d) ) + goto fail2; + rc = hvm_funcs.domain_initialise(d); if ( rc != 0 ) goto fail2; @@ -645,8 +648,8 @@ static void hvm_destroy_ioreq_servers(struct domain *d) void hvm_domain_relinquish_resources(struct domain *d) { - hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.ioreq); - hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.buf_ioreq); + hvm_destroy_ioreq_servers(d); + hvm_destroy_pci_emul(d); msixtbl_pt_cleanup(d); @@ -1104,27 +1107,11 @@ int hvm_vcpu_initialise(struct vcpu *v) && (rc = nestedhvm_vcpu_initialise(v)) < 0 ) goto fail3; - /* Create ioreq event channel. */ - rc = alloc_unbound_xen_event_channel(v, 0, NULL); - if ( rc < 0 ) - goto fail4; - - /* Register ioreq event channel. */ - v->arch.hvm_vcpu.xen_port = rc; - - if ( v->vcpu_id == 0 ) - { - /* Create bufioreq event channel. */ - rc = alloc_unbound_xen_event_channel(v, 0, NULL); - if ( rc < 0 ) - goto fail2; - v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN] = rc; - } + rc = hvm_ioreq_servers_new_vcpu(v); + if ( rc != 0 ) + goto fail3; - spin_lock(&v->domain->arch.hvm_domain.ioreq.lock); - if ( v->domain->arch.hvm_domain.ioreq.va != NULL ) - get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port; - spin_unlock(&v->domain->arch.hvm_domain.ioreq.lock); + v->arch.hvm_vcpu.ioreq = &v->domain->arch.hvm_domain.ioreq; spin_lock_init(&v->arch.hvm_vcpu.tm_lock); INIT_LIST_HEAD(&v->arch.hvm_vcpu.tm_list); -- Julien Grall
This patch removes useless hvm_param due to structure modification and bind new hypercalls to handle ioreq servers and PCI. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- xen/arch/x86/hvm/hvm.c | 150 +++++++++++++++++++++------------------ xen/include/public/hvm/params.h | 5 -- 2 files changed, 81 insertions(+), 74 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 292d57b..a2cd9b3 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -571,8 +571,7 @@ int hvm_domain_initialise(struct domain *d) register_portio_handler(d, 0xe9, 1, hvm_print_line); - if ( hvm_init_pci_emul(d) ) - goto fail2; + hvm_init_pci_emul(d); rc = hvm_funcs.domain_initialise(d); if ( rc != 0 ) @@ -650,6 +649,7 @@ void hvm_domain_relinquish_resources(struct domain *d) { hvm_destroy_ioreq_servers(d); hvm_destroy_pci_emul(d); + hvm_destroy_ioreq_page(d, &d->arch.hvm_domain.ioreq); msixtbl_pt_cleanup(d); @@ -3742,21 +3742,6 @@ static int hvmop_flush_tlb_all(void) return 0; } -static int hvm_replace_event_channel(struct vcpu *v, domid_t remote_domid, - int *p_port) -{ - int old_port, new_port; - - new_port = alloc_unbound_xen_event_channel(v, remote_domid, NULL); - if ( new_port < 0 ) - return new_port; - - /* xchg() ensures that only we call free_xen_event_channel(). */ - old_port = xchg(p_port, new_port); - free_xen_event_channel(v, old_port); - return 0; -} - static int hvm_alloc_ioreq_server_page(struct domain *d, struct hvm_ioreq_server *s, struct hvm_ioreq_page *pfn, @@ -4041,7 +4026,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg) case HVMOP_get_param: { struct xen_hvm_param a; - struct hvm_ioreq_page *iorp; struct domain *d; struct vcpu *v; @@ -4069,20 +4053,12 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg) switch ( a.index ) { - case HVM_PARAM_IOREQ_PFN: - iorp = &d->arch.hvm_domain.ioreq; - if ( (rc = hvm_set_ioreq_page(d, iorp, a.value)) != 0 ) - break; - spin_lock(&iorp->lock); - if ( iorp->va != NULL ) - /* Initialise evtchn port info if VCPUs already created. */ - for_each_vcpu ( d, v ) - get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port; - spin_unlock(&iorp->lock); + case HVM_PARAM_IO_PFN_FIRST: + rc = hvm_set_ioreq_page(d, &d->arch.hvm_domain.ioreq, a.value); break; - case HVM_PARAM_BUFIOREQ_PFN: - iorp = &d->arch.hvm_domain.buf_ioreq; - rc = hvm_set_ioreq_page(d, iorp, a.value); + case HVM_PARAM_IO_PFN_LAST: + if ( (d->arch.hvm_domain.params[HVM_PARAM_IO_PFN_LAST]) ) + rc = -EINVAL; break; case HVM_PARAM_CALLBACK_IRQ: hvm_set_callback_via(d, a.value); @@ -4128,41 +4104,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg) domctl_lock_release(); break; - case HVM_PARAM_DM_DOMAIN: - /* Not reflexive, as we must domain_pause(). */ - rc = -EPERM; - if ( curr_d == d ) - break; - - if ( a.value == DOMID_SELF ) - a.value = curr_d->domain_id; - - rc = 0; - domain_pause(d); /* safe to change per-vcpu xen_port */ - if ( d->vcpu[0] ) - rc = hvm_replace_event_channel(d->vcpu[0], a.value, - (int *)&d->vcpu[0]->domain->arch.hvm_domain.params - [HVM_PARAM_BUFIOREQ_EVTCHN]); - if ( rc ) - { - domain_unpause(d); - break; - } - iorp = &d->arch.hvm_domain.ioreq; - for_each_vcpu ( d, v ) - { - rc = hvm_replace_event_channel(v, a.value, - &v->arch.hvm_vcpu.xen_port); - if ( rc ) - break; - - spin_lock(&iorp->lock); - if ( iorp->va != NULL ) - get_ioreq(v)->vp_eport = v->arch.hvm_vcpu.xen_port; - spin_unlock(&iorp->lock); - } - domain_unpause(d); - break; case HVM_PARAM_ACPI_S_STATE: /* Not reflexive, as we must domain_pause(). */ rc = -EPERM; @@ -4213,9 +4154,6 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg) if ( rc == 0 ) rc = nestedhvm_vcpu_initialise(v); break; - case HVM_PARAM_BUFIOREQ_EVTCHN: - rc = -EINVAL; - break; } if ( rc == 0 ) @@ -4669,6 +4607,80 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg) break; } + case HVMOP_register_ioreq_server: + { + struct xen_hvm_register_ioreq_server a; + + if ( copy_from_guest(&a, arg, 1) ) + return -EFAULT; + + rc = hvmop_register_ioreq_server(&a); + if ( rc != 0 ) + return rc; + + rc = copy_to_guest(arg, &a, 1) ? -EFAULT : 0; + break; + } + + case HVMOP_get_ioreq_server_buf_channel: + { + struct xen_hvm_get_ioreq_server_buf_channel a; + + if ( copy_from_guest(&a, arg, 1) ) + return -EFAULT; + + rc = hvmop_get_ioreq_server_buf_channel(&a); + if ( rc != 0 ) + return rc; + + rc = copy_to_guest(arg, &a, 1) ? -EFAULT : 0; + + break; + } + + case HVMOP_map_io_range_to_ioreq_server: + { + struct xen_hvm_map_io_range_to_ioreq_server a; + + if ( copy_from_guest(&a, arg, 1) ) + return -EFAULT; + + rc = hvmop_map_io_range_to_ioreq_server(&a); + if ( rc != 0 ) + return rc; + + break; + } + + case HVMOP_unmap_io_range_from_ioreq_server: + { + struct xen_hvm_unmap_io_range_from_ioreq_server a; + + if ( copy_from_guest(&a, arg, 1) ) + return -EFAULT; + + rc = hvmop_unmap_io_range_from_ioreq_server(&a); + if ( rc != 0 ) + return rc; + + break; + } + + case HVMOP_register_pcidev: + { + struct xen_hvm_register_pcidev a; + + if ( copy_from_guest(&a, arg, 1) ) + return -EFAULT; + + rc = hvm_register_pcidev(a.domid, a.id, a.domain, + a.bus, a.device, a.function); + if ( rc != 0 ) + return rc; + + break; + } + default: { gdprintk(XENLOG_DEBUG, "Bad HVM op %ld.\n", op); diff --git a/xen/include/public/hvm/params.h b/xen/include/public/hvm/params.h index 309ac1b..017493b 100644 --- a/xen/include/public/hvm/params.h +++ b/xen/include/public/hvm/params.h @@ -49,11 +49,6 @@ #define HVM_PARAM_PAE_ENABLED 4 -#define HVM_PARAM_IOREQ_PFN 5 - -#define HVM_PARAM_BUFIOREQ_PFN 6 -#define HVM_PARAM_BUFIOREQ_EVTCHN 26 - #ifdef __ia64__ #define HVM_PARAM_NVRAM_FD 7 -- Julien Grall
Julien Grall
2012-Aug-22 12:31 UTC
[XEN][RFC PATCH V2 06/17] hvm-io: IO refactoring with ioreq server
Modification of several parts of the IO handle. Each vcpu now contain a pointer to the current IO shared page. A default shared page has been created for IO handle by Xen. Each time that Xen receives an ioreq, it will use the default shared page and set the right shared page when it''s able to know the server. Moreover, all IO which are unhandleabled by Xen or by a server will be directly discard inside Xen. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- xen/arch/x86/hvm/emulate.c | 56 +++++++++++++++++++++++++++++++++++++ xen/arch/x86/hvm/hvm.c | 5 ++- xen/include/asm-x86/hvm/support.h | 26 ++++++++++++++-- 3 files changed, 81 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c index 9bfba48..9e636b6 100644 --- a/xen/arch/x86/hvm/emulate.c +++ b/xen/arch/x86/hvm/emulate.c @@ -49,6 +49,55 @@ static void hvmtrace_io_assist(int is_mmio, ioreq_t *p) trace_var(event, 0/*!cycles*/, size, buffer); } +static int hvmemul_prepare_assist(ioreq_t *p) +{ + struct vcpu *v = current; + struct hvm_ioreq_server *s; + int i; + int sign; + uint32_t data = ~0; + + if ( p->type == IOREQ_TYPE_PCI_CONFIG ) + return X86EMUL_UNHANDLEABLE; + + spin_lock(&v->domain->arch.hvm_domain.ioreq_server_lock); + for ( s = v->domain->arch.hvm_domain.ioreq_server_list; s; s = s->next ) + { + struct hvm_io_range *x = (p->type == IOREQ_TYPE_COPY) + ? s->mmio_range_list : s->portio_range_list; + + for ( ; x; x = x->next ) + { + if ( (p->addr >= x->s) && (p->addr <= x->e) ) + goto done_server_scan; + } + } + + spin_unlock(&v->domain->arch.hvm_domain.ioreq_server_lock); + + sign = p->df ? -1 : 1; + + if ( p->dir != IOREQ_WRITE ) + { + if ( !p->data_is_ptr ) + p->data = ~0; + else + { + for ( i = 0; i < p->count; i++ ) + hvm_copy_to_guest_phys(p->data + sign * i * p->size, &data, + p->size); + } + } + + return X86EMUL_OKAY; + + done_server_scan: + set_ioreq(v, &s->ioreq, p); + spin_unlock(&v->domain->arch.hvm_domain.ioreq_server_lock); + + return X86EMUL_UNHANDLEABLE; +} + static int hvmemul_do_io( int is_mmio, paddr_t addr, unsigned long *reps, int size, paddr_t ram_gpa, int dir, int df, void *p_data) @@ -173,6 +222,10 @@ static int hvmemul_do_io( (p_data == NULL) ? HVMIO_dispatched : HVMIO_awaiting_completion; vio->io_size = size; + /* Use the default shared page */ + current->arch.hvm_vcpu.ioreq = &curr->domain->arch.hvm_domain.ioreq; + p = get_ioreq(current); + p->dir = dir; p->data_is_ptr = value_is_ptr; p->type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO; @@ -196,6 +249,9 @@ static int hvmemul_do_io( rc = hvm_portio_intercept(p); } + if ( rc == X86EMUL_UNHANDLEABLE ) + rc = hvmemul_prepare_assist(p); + switch ( rc ) { case X86EMUL_OKAY: diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index a2cd9b3..33ef0f2 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1223,14 +1223,15 @@ bool_t hvm_send_assist_req(struct vcpu *v) return 0; } - prepare_wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port); + prepare_wait_on_xen_event_channel(p->vp_eport); /* * Following happens /after/ blocking and setting up ioreq contents. * prepare_wait_on_xen_event_channel() is an implicit barrier. */ p->state = STATE_IOREQ_READY; - notify_via_xen_event_channel(v->domain, v->arch.hvm_vcpu.xen_port); + + notify_via_xen_event_channel(v->domain, p->vp_eport); return 1; } diff --git a/xen/include/asm-x86/hvm/support.h b/xen/include/asm-x86/hvm/support.h index f9b102f..44acd37 100644 --- a/xen/include/asm-x86/hvm/support.h +++ b/xen/include/asm-x86/hvm/support.h @@ -29,13 +29,31 @@ static inline ioreq_t *get_ioreq(struct vcpu *v) { - struct domain *d = v->domain; - shared_iopage_t *p = d->arch.hvm_domain.ioreq.va; - ASSERT((v == current) || spin_is_locked(&d->arch.hvm_domain.ioreq.lock)); - ASSERT(d->arch.hvm_domain.ioreq.va != NULL); + shared_iopage_t *p = v->arch.hvm_vcpu.ioreq->va; + ASSERT((v == current) || spin_is_locked(&v->arch.hvm_vcpu.ioreq->lock)); + ASSERT(v->arch.hvm_vcpu.ioreq->va != NULL); return &p->vcpu_ioreq[v->vcpu_id]; } +static inline void set_ioreq(struct vcpu *v, struct hvm_ioreq_page *page, + ioreq_t *p) +{ + ioreq_t *np; + + v->arch.hvm_vcpu.ioreq = page; + spin_lock(&v->arch.hvm_vcpu.ioreq->lock); + np = get_ioreq(v); + np->dir = p->dir; + np->data_is_ptr = p->data_is_ptr; + np->type = p->type; + np->size = p->size; + np->addr = p->addr; + np->count = p->count; + np->df = p->df; + np->data = p->data; + spin_unlock(&v->arch.hvm_vcpu.ioreq->lock); +} + #define HVM_DELIVER_NO_ERROR_CODE -1 #ifndef NDEBUG -- Julien Grall
Julien Grall
2012-Aug-22 12:31 UTC
[XEN][RFC PATCH V2 07/17] hvm-io: send invalidate map cache to each registered servers
When an invalidate mapcache cache occurs, Xen need to send and IOREQ_TYPE_INVALIDATE to each server and wait that all IO is completed. We introduce a new function hvm_wait_on_io to wait until an IO is completed. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- xen/arch/x86/hvm/hvm.c | 41 ++++++++++++++++++++++++++++++++--------- xen/arch/x86/hvm/io.c | 15 +++++++++++++-- 2 files changed, 45 insertions(+), 11 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 33ef0f2..fdb2515 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -316,16 +316,9 @@ void hvm_migrate_pirqs(struct vcpu *v) spin_unlock(&d->event_lock); } -void hvm_do_resume(struct vcpu *v) +static void hvm_wait_on_io(struct vcpu *v, ioreq_t *p) { - ioreq_t *p; - - pt_restore_timer(v); - - check_wakeup_from_wait(); - /* NB. Optimised for common case (p->state == STATE_IOREQ_NONE). */ - p = get_ioreq(v); while ( p->state != STATE_IOREQ_NONE ) { switch ( p->state ) @@ -335,7 +328,7 @@ void hvm_do_resume(struct vcpu *v) break; case STATE_IOREQ_READY: /* IOREQ_{READY,INPROCESS} -> IORESP_READY */ case STATE_IOREQ_INPROCESS: - wait_on_xen_event_channel(v->arch.hvm_vcpu.xen_port, + wait_on_xen_event_channel(p->vp_eport, (p->state != STATE_IOREQ_READY) && (p->state != STATE_IOREQ_INPROCESS)); break; @@ -345,6 +338,36 @@ void hvm_do_resume(struct vcpu *v) return; /* bail */ } } +} + +void hvm_do_resume(struct vcpu *v) +{ + ioreq_t *p; + struct hvm_ioreq_server *s; + shared_iopage_t *page; + + pt_restore_timer(v); + + check_wakeup_from_wait(); + + p = get_ioreq(v); + + if ( p->type == IOREQ_TYPE_INVALIDATE ) + { + spin_lock(&v->domain->arch.hvm_domain.ioreq_server_lock); + /* Wait all servers */ + for ( s = v->domain->arch.hvm_domain.ioreq_server_list; s; s = s->next ) + { + page = s->ioreq.va; + ASSERT((v == current) || spin_is_locked(&s->ioreq.lock)); + ASSERT(s->ioreq.va != NULL); + v->arch.hvm_vcpu.ioreq = &s->ioreq; + hvm_wait_on_io(v, &page->vcpu_ioreq[v->vcpu_id]); + } + spin_unlock(&v->domain->arch.hvm_domain.ioreq_server_lock); + } + else + hvm_wait_on_io(v, p); /* Inject pending hw/sw trap */ if ( v->arch.hvm_vcpu.inject_trap.vector != -1 ) diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c index c20f4e8..b73a462 100644 --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -150,7 +150,8 @@ void send_timeoffset_req(unsigned long timeoff) void send_invalidate_req(void) { struct vcpu *v = current; - ioreq_t *p = get_ioreq(v); + ioreq_t p[1]; + struct hvm_ioreq_server *s; if ( p->state != STATE_IOREQ_NONE ) { @@ -164,8 +165,18 @@ void send_invalidate_req(void) p->size = 4; p->dir = IOREQ_WRITE; p->data = ~0UL; /* flush all */ + p->count = 0; + p->addr = 0; + + spin_lock(&v->domain->arch.hvm_domain.ioreq_server_lock); + for ( s = v->domain->arch.hvm_domain.ioreq_server_list; s; s = s->next ) + { + set_ioreq(v, &s->ioreq, p); + (void)hvm_send_assist_req(v); + } + spin_unlock(&v->domain->arch.hvm_domain.ioreq_server_lock); - (void)hvm_send_assist_req(v); + set_ioreq(v, &v->domain->arch.hvm_domain.ioreq, p); } int handle_mmio(void) -- Julien Grall
Julien Grall
2012-Aug-22 12:31 UTC
[XEN][RFC PATCH V2 08/17] hvm-io: Handle server in buffered IO
As for the normal IO, Xen browses the ranges to find which server is able to handle the IO. There is a special case for IOREQ_TYPE_TIMEOFFSET. Indeed, this IO must be send to all servers. For this purpose, a new function hvm_buffered_io_send_server was introduced. It sends an IO to a specific server. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- xen/arch/x86/hvm/io.c | 75 +++++++++++++++++++++++++++++++++++++----------- 1 files changed, 58 insertions(+), 17 deletions(-) diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c index b73a462..6e0160c 100644 --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -46,28 +46,17 @@ #include <xen/iocap.h> #include <public/hvm/ioreq.h> -int hvm_buffered_io_send(ioreq_t *p) +static int hvm_buffered_io_send_to_server(ioreq_t *p, struct hvm_ioreq_server *s) { struct vcpu *v = current; - struct hvm_ioreq_page *iorp = &v->domain->arch.hvm_domain.buf_ioreq; - buffered_iopage_t *pg = iorp->va; + struct hvm_ioreq_page *iorp; + buffered_iopage_t *pg; buf_ioreq_t bp; /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */ int qw = 0; - /* Ensure buffered_iopage fits in a page */ - BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE); - - /* - * Return 0 for the cases we can''t deal with: - * - ''addr'' is only a 20-bit field, so we cannot address beyond 1MB - * - we cannot buffer accesses to guest memory buffers, as the guest - * may expect the memory buffer to be synchronously accessed - * - the count field is usually used with data_is_ptr and since we don''t - * support data_is_ptr we do not waste space for the count field either - */ - if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) ) - return 0; + iorp = &s->buf_ioreq; + pg = iorp->va; bp.type = p->type; bp.dir = p->dir; @@ -119,12 +108,64 @@ int hvm_buffered_io_send(ioreq_t *p) pg->write_pointer += qw ? 2 : 1; notify_via_xen_event_channel(v->domain, - v->domain->arch.hvm_domain.params[HVM_PARAM_BUFIOREQ_EVTCHN]); + s->buf_ioreq_evtchn); spin_unlock(&iorp->lock); return 1; } +int hvm_buffered_io_send(ioreq_t *p) +{ + struct vcpu *v = current; + struct hvm_ioreq_server *s; + int rc = 1; + + /* Ensure buffered_iopage fits in a page */ + BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE); + + /* + * Return 0 for the cases we can''t deal with: + * - ''addr'' is only a 20-bit field, so we cannot address beyond 1MB + * - we cannot buffer accesses to guest memory buffers, as the guest + * may expect the memory buffer to be synchronously accessed + * - the count field is usually used with data_is_ptr and since we don''t + * support data_is_ptr we do not waste space for the count field either + */ + if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) ) + return 0; + + spin_lock(&v->domain->arch.hvm_domain.ioreq_server_lock); + if ( p->type == IOREQ_TYPE_TIMEOFFSET ) + { + /* Send TIME OFFSET to all servers */ + for ( s = v->domain->arch.hvm_domain.ioreq_server_list; s; s = s->next ) + rc = hvm_buffered_io_send_to_server(p, s) && rc; + } + else + { + for ( s = v->domain->arch.hvm_domain.ioreq_server_list; s; s = s->next ) + { + struct hvm_io_range *x = (p->type == IOREQ_TYPE_COPY) + ? s->mmio_range_list : s->portio_range_list; + for ( ; x; x = x->next ) + { + if ( (p->addr >= x->s) && (p->addr <= x->e) ) + { + rc = hvm_buffered_io_send_to_server(p, s); + spin_unlock(&v->domain->arch.hvm_domain.ioreq_server_lock); + + return rc; + } + } + } + rc = 0; + } + + spin_unlock(&v->domain->arch.hvm_domain.ioreq_server_lock); + + return rc; +} + void send_timeoffset_req(unsigned long timeoff) { ioreq_t p[1]; -- Julien Grall
Julien Grall
2012-Aug-22 12:31 UTC
[XEN][RFC PATCH V2 09/17] xc: Add the hypercall for multiple servers
This patch add 5 hypercalls to register server, io range and PCI. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- tools/libxc/xc_domain.c | 155 +++++++++++++++++++++++++++++++++++++++++++++++ tools/libxc/xenctrl.h | 21 ++++++ 2 files changed, 176 insertions(+), 0 deletions(-) diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index d98e68b..cb186c1 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -1514,6 +1514,161 @@ int xc_domain_set_virq_handler(xc_interface *xch, uint32_t domid, int virq) return do_domctl(xch, &domctl); } +ioservid_or_error_t xc_hvm_register_ioreq_server(xc_interface *xch, + domid_t dom) +{ + DECLARE_HYPERCALL; + DECLARE_HYPERCALL_BUFFER(xen_hvm_register_ioreq_server_t, arg); + ioservid_or_error_t rc = -1; + + arg = xc_hypercall_buffer_alloc(xch, arg, sizeof (*arg)); + if ( !arg ) + { + PERROR("Could not allocate memory for xc_hvm_register_ioreq_server hypercall"); + goto out; + } + + hypercall.op = __HYPERVISOR_hvm_op; + hypercall.arg[0] = HVMOP_register_ioreq_server; + hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg); + + arg->domid = dom; + rc = do_xen_hypercall(xch, &hypercall); + if ( !rc ) + rc = arg->id; + + xc_hypercall_buffer_free(xch, arg); +out: + return rc; +} + +evtchn_port_or_error_t xc_hvm_get_ioreq_server_buf_channel(xc_interface *xch, + domid_t dom, + ioservid_t id) +{ + DECLARE_HYPERCALL; + DECLARE_HYPERCALL_BUFFER(xen_hvm_get_ioreq_server_buf_channel_t, arg); + evtchn_port_or_error_t rc = -1; + + arg = xc_hypercall_buffer_alloc(xch, arg, sizeof (*arg)); + if ( !arg ) + { + PERROR("Could not allocate memory for xc_hvm_get_ioreq_servr_buf_channel"); + goto out; + } + + hypercall.op = __HYPERVISOR_hvm_op; + hypercall.arg[0] = HVMOP_get_ioreq_server_buf_channel; + hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg); + + arg->domid = dom; + arg->id = id; + rc = do_xen_hypercall(xch, &hypercall); + + if ( !rc ) + rc = arg->channel; + + xc_hypercall_buffer_free(xch, arg); + +out: + return rc; +} + +int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t dom, + ioservid_t id, int is_mmio, + uint64_t start, uint64_t end) +{ + DECLARE_HYPERCALL; + DECLARE_HYPERCALL_BUFFER(xen_hvm_map_io_range_to_ioreq_server_t, arg); + int rc = -1; + + arg = xc_hypercall_buffer_alloc(xch, arg, sizeof (*arg)); + if ( !arg ) + { + PERROR("Could not allocate memory for xc_hvm_map_io_range_to_ioreq_server hypercall"); + goto out; + } + + hypercall.op = __HYPERVISOR_hvm_op; + hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server; + hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg); + + arg->domid = dom; + arg->id = id; + arg->is_mmio = is_mmio; + arg->s = start; + arg->e = end; + + rc = do_xen_hypercall(xch, &hypercall); + + xc_hypercall_buffer_free(xch, arg); +out: + return rc; +} + +int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t dom, + ioservid_t id, int is_mmio, + uint64_t addr) +{ + DECLARE_HYPERCALL; + DECLARE_HYPERCALL_BUFFER(xen_hvm_unmap_io_range_from_ioreq_server_t, arg); + int rc = -1; + + arg = xc_hypercall_buffer_alloc(xch, arg, sizeof (*arg)); + if ( !arg ) + { + PERROR("Could not allocate memory for xc_hvm_unmap_io_range_from_ioreq_server hypercall"); + goto out; + } + + hypercall.op = __HYPERVISOR_hvm_op; + hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server; + hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg); + + arg->domid = dom; + arg->id = id; + arg->is_mmio = is_mmio; + arg->addr = addr; + rc = do_xen_hypercall(xch, &hypercall); + + xc_hypercall_buffer_free(xch, arg); +out: + return rc; +} + +int xc_hvm_register_pcidev(xc_interface *xch, domid_t dom, ioservid_t id, + uint8_t domain, uint8_t bus, uint8_t device, + uint8_t function) +{ + DECLARE_HYPERCALL; + DECLARE_HYPERCALL_BUFFER(xen_hvm_register_pcidev_t, arg); + int rc = -1; + + arg = xc_hypercall_buffer_alloc(xch, arg, sizeof (*arg)); + if ( !arg ) + { + PERROR("Could not allocate memory for xc_hvm_create_pci hypercall"); + goto out; + } + + hypercall.op = __HYPERVISOR_hvm_op; + hypercall.arg[0] = HVMOP_register_pcidev; + hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg); + + arg->domid = dom; + arg->id = id; + arg->domain = domain; + arg->bus = bus; + arg->device = device; + arg->function = function; + rc = do_xen_hypercall(xch, &hypercall); + + xc_hypercall_buffer_free(xch, arg); +out: + return rc; +} + + /* * Local variables: * mode: C diff --git a/tools/libxc/xenctrl.h b/tools/libxc/xenctrl.h index b7741ca..65a950e 100644 --- a/tools/libxc/xenctrl.h +++ b/tools/libxc/xenctrl.h @@ -1659,6 +1659,27 @@ void xc_clear_last_error(xc_interface *xch); int xc_set_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long value); int xc_get_hvm_param(xc_interface *handle, domid_t dom, int param, unsigned long *value); +/* A IO server identifier is guaranteed to fit in 31 bits. */ +typedef int ioservid_or_error_t; + +ioservid_or_error_t xc_hvm_register_ioreq_server(xc_interface *xch, + domid_t dom); +evtchn_port_or_error_t xc_hvm_get_ioreq_server_buf_channel(xc_interface *xch, + domid_t dom, + ioservid_t id); +int xc_hvm_map_io_range_to_ioreq_server(xc_interface *xch, domid_t dom, + ioservid_t id, int is_mmio, + uint64_t start, uint64_t end); +int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t dom, + ioservid_t id, int is_mmio, + uint64_t addr); +/* + * Register a PCI device + */ +int xc_hvm_register_pcidev(xc_interface *xch, domid_t dom, unsigned int id, + uint8_t domain, uint8_t bus, uint8_t device, + uint8_t function); + /* IA64 specific, nvram save */ int xc_ia64_save_to_nvram(xc_interface *xch, uint32_t dom); -- Julien Grall
Julien Grall
2012-Aug-22 12:31 UTC
[XEN][RFC PATCH V2 10/17] xc: Add argument to allocate more special pages
This patch permits to allocate more special pages. Indeed, for multiple ioreq server, we need to have 2 shared pages by server. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- tools/libxc/xc_hvm_build_x86.c | 59 +++++++++++++++++++----------------- tools/libxc/xenguest.h | 4 ++- tools/python/xen/lowlevel/xc/xc.c | 3 +- 3 files changed, 36 insertions(+), 30 deletions(-) diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c index cf5d7fb..b98536b 100644 --- a/tools/libxc/xc_hvm_build_x86.c +++ b/tools/libxc/xc_hvm_build_x86.c @@ -41,16 +41,15 @@ #define SPECIALPAGE_PAGING 0 #define SPECIALPAGE_ACCESS 1 #define SPECIALPAGE_SHARING 2 -#define SPECIALPAGE_BUFIOREQ 3 -#define SPECIALPAGE_XENSTORE 4 -#define SPECIALPAGE_IOREQ 5 -#define SPECIALPAGE_IDENT_PT 6 -#define SPECIALPAGE_CONSOLE 7 -#define NR_SPECIAL_PAGES 8 -#define special_pfn(x) (0xff000u - NR_SPECIAL_PAGES + (x)) +#define SPECIALPAGE_XENSTORE 3 +#define SPECIALPAGE_IDENT_PT 4 +#define SPECIALPAGE_CONSOLE 5 +#define NR_SPECIAL_PAGES 6 +#define special_pfn(x, add) (0xff000u - (NR_SPECIAL_PAGES + (add)) + (x)) static void build_hvm_info(void *hvm_info_page, uint64_t mem_size, - uint64_t mmio_start, uint64_t mmio_size) + uint64_t mmio_start, uint64_t mmio_size, + uint32_t nr_special_pages) { struct hvm_info_table *hvm_info = (struct hvm_info_table *) (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET); @@ -78,7 +77,7 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size, /* Memory parameters. */ hvm_info->low_mem_pgend = lowmem_end >> PAGE_SHIFT; hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT; - hvm_info->reserved_mem_pgstart = special_pfn(0); + hvm_info->reserved_mem_pgstart = special_pfn(0, nr_special_pages); /* Finish with the checksum. */ for ( i = 0, sum = 0; i < hvm_info->length; i++ ) @@ -148,6 +147,7 @@ static int setup_guest(xc_interface *xch, unsigned long target_pages = args->mem_target >> PAGE_SHIFT; uint64_t mmio_start = (1ull << 32) - args->mmio_size; uint64_t mmio_size = args->mmio_size; + uint32_t nr_special_pages = args->nr_special_pages; unsigned long entry_eip, cur_pages, cur_pfn; void *hvm_info_page; uint32_t *ident_pt; @@ -341,37 +341,38 @@ static int setup_guest(xc_interface *xch, xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, HVM_INFO_PFN)) == NULL ) goto error_out; - build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size); + build_hvm_info(hvm_info_page, v_end, mmio_start, mmio_size, nr_special_pages); munmap(hvm_info_page, PAGE_SIZE); /* Allocate and clear special pages. */ - for ( i = 0; i < NR_SPECIAL_PAGES; i++ ) + for ( i = 0; i < (NR_SPECIAL_PAGES + nr_special_pages); i++ ) { - xen_pfn_t pfn = special_pfn(i); + xen_pfn_t pfn = special_pfn(i, nr_special_pages); rc = xc_domain_populate_physmap_exact(xch, dom, 1, 0, 0, &pfn); if ( rc != 0 ) { PERROR("Could not allocate %d''th special page.", i); goto error_out; } - if ( xc_clear_domain_page(xch, dom, special_pfn(i)) ) + if ( xc_clear_domain_page(xch, dom, special_pfn(i, nr_special_pages)) ) goto error_out; } xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN, - special_pfn(SPECIALPAGE_XENSTORE)); - xc_set_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN, - special_pfn(SPECIALPAGE_BUFIOREQ)); - xc_set_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN, - special_pfn(SPECIALPAGE_IOREQ)); + special_pfn(SPECIALPAGE_XENSTORE, nr_special_pages)); xc_set_hvm_param(xch, dom, HVM_PARAM_CONSOLE_PFN, - special_pfn(SPECIALPAGE_CONSOLE)); + special_pfn(SPECIALPAGE_CONSOLE, nr_special_pages)); xc_set_hvm_param(xch, dom, HVM_PARAM_PAGING_RING_PFN, - special_pfn(SPECIALPAGE_PAGING)); + special_pfn(SPECIALPAGE_PAGING, nr_special_pages)); xc_set_hvm_param(xch, dom, HVM_PARAM_ACCESS_RING_PFN, - special_pfn(SPECIALPAGE_ACCESS)); + special_pfn(SPECIALPAGE_ACCESS, nr_special_pages)); xc_set_hvm_param(xch, dom, HVM_PARAM_SHARING_RING_PFN, - special_pfn(SPECIALPAGE_SHARING)); + special_pfn(SPECIALPAGE_SHARING, nr_special_pages)); + xc_set_hvm_param(xch, dom, HVM_PARAM_IO_PFN_FIRST, + special_pfn(NR_SPECIAL_PAGES, nr_special_pages)); + xc_set_hvm_param(xch, dom, HVM_PARAM_IO_PFN_LAST, + special_pfn(NR_SPECIAL_PAGES + nr_special_pages - 1, + nr_special_pages)); /* * Identity-map page table is required for running with CR0.PG=0 when @@ -379,14 +380,14 @@ static int setup_guest(xc_interface *xch, */ if ( (ident_pt = xc_map_foreign_range( xch, dom, PAGE_SIZE, PROT_READ | PROT_WRITE, - special_pfn(SPECIALPAGE_IDENT_PT))) == NULL ) + special_pfn(SPECIALPAGE_IDENT_PT, nr_special_pages))) == NULL ) goto error_out; for ( i = 0; i < PAGE_SIZE / sizeof(*ident_pt); i++ ) ident_pt[i] = ((i << 22) | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE); munmap(ident_pt, PAGE_SIZE); xc_set_hvm_param(xch, dom, HVM_PARAM_IDENT_PT, - special_pfn(SPECIALPAGE_IDENT_PT) << PAGE_SHIFT); + special_pfn(SPECIALPAGE_IDENT_PT, nr_special_pages) << PAGE_SHIFT); /* Insert JMP <rel32> instruction at address 0x0 to reach entry point. */ entry_eip = elf_uval(&elf, elf.ehdr, e_entry); @@ -454,16 +455,18 @@ int xc_hvm_build(xc_interface *xch, uint32_t domid, * If target == memsize, pages are populated normally. */ int xc_hvm_build_target_mem(xc_interface *xch, - uint32_t domid, - int memsize, - int target, - const char *image_name) + uint32_t domid, + int memsize, + int target, + const char *image_name, + uint32_t nr_special_pages) { struct xc_hvm_build_args args = {}; args.mem_size = (uint64_t)memsize << 20; args.mem_target = (uint64_t)target << 20; args.image_file_name = image_name; + args.nr_special_pages = nr_special_pages; return xc_hvm_build(xch, domid, &args); } diff --git a/tools/libxc/xenguest.h b/tools/libxc/xenguest.h index 707e31c..9a0d38f 100644 --- a/tools/libxc/xenguest.h +++ b/tools/libxc/xenguest.h @@ -216,6 +216,7 @@ struct xc_hvm_build_args { uint64_t mem_target; /* Memory target in bytes. */ uint64_t mmio_size; /* Size of the MMIO hole in bytes. */ const char *image_file_name; /* File name of the image to load. */ + uint32_t nr_special_pages; /* Additional special pages for io daemon */ }; /** @@ -234,7 +235,8 @@ int xc_hvm_build_target_mem(xc_interface *xch, uint32_t domid, int memsize, int target, - const char *image_name); + const char *image_name, + uint32_t nr_special_pages); int xc_suspend_evtchn_release(xc_interface *xch, xc_evtchn *xce, int domid, int suspend_evtchn); diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c index 7c89756..eb004b6 100644 --- a/tools/python/xen/lowlevel/xc/xc.c +++ b/tools/python/xen/lowlevel/xc/xc.c @@ -984,8 +984,9 @@ static PyObject *pyxc_hvm_build(XcObject *self, if ( target == -1 ) target = memsize; + // Ugly fix : we must retrieve the number of servers if ( xc_hvm_build_target_mem(self->xc_handle, dom, memsize, - target, image) != 0 ) + target, image, 0) != 0 ) return pyxc_error_to_exception(self->xc_handle); #if !defined(__ia64__) -- Julien Grall
Julien Grall
2012-Aug-22 12:31 UTC
[XEN][RFC PATCH V2 11/17] xc: modify save/restore to support multiple device models
- add save/restore new special pages and remove unused - modify save file structure to allow multiple qemu states Signed-off-by: Julien Grall <julien.grall@citrix.com> --- tools/libxc/xc_domain_restore.c | 150 +++++++++++++++++++++++++++++---------- tools/libxc/xc_domain_save.c | 6 +- 2 files changed, 116 insertions(+), 40 deletions(-) diff --git a/tools/libxc/xc_domain_restore.c b/tools/libxc/xc_domain_restore.c index 3fe2b12..9a49ee2 100644 --- a/tools/libxc/xc_domain_restore.c +++ b/tools/libxc/xc_domain_restore.c @@ -103,6 +103,9 @@ static ssize_t rdexact(xc_interface *xch, struct restore_ctx *ctx, #else #define RDEXACT read_exact #endif + +#define QEMUSIG_SIZE 21 + /* ** In the state file (or during transfer), all page-table pages are ** converted into a ''canonical'' form where references to actual mfns @@ -342,8 +345,11 @@ typedef struct { uint32_t version; uint64_t len; } qemuhdr; - uint32_t qemubufsize; - uint8_t* qemubuf; + uint32_t num_dms; + struct devmodel_buffer { + uint32_t size; + uint8_t* buf; + } *dmsbuf; } hvm; } u; } tailbuf_t; @@ -392,63 +398,112 @@ static int compat_buffer_qemu(xc_interface *xch, struct restore_ctx *ctx, return -1; } - buf->qemubuf = qbuf; - buf->qemubufsize = dlen; + if ( !(buf->dmsbuf = calloc(1, sizeof(*buf->dmsbuf))) ) { + ERROR("Error allocating Device Model buffer"); + free(qbuf); + return -1; + } + + buf->dmsbuf[0].buf = qbuf; + buf->dmsbuf[0].size = dlen; + buf->num_dms = 1; return 0; } static int buffer_qemu(xc_interface *xch, struct restore_ctx *ctx, - int fd, struct tailbuf_hvm *buf) + uint32_t dmid, int fd, struct tailbuf_hvm *buf) { uint32_t qlen; uint8_t *tmp; + struct devmodel_buffer *dmb = &buf->dmsbuf[dmid]; if ( RDEXACT(fd, &qlen, sizeof(qlen)) ) { - PERROR("Error reading QEMU header length"); + PERROR("Error reading Device Model %u header length", dmid); return -1; } - if ( qlen > buf->qemubufsize ) { - if ( buf->qemubuf) { - tmp = realloc(buf->qemubuf, qlen); + if ( qlen > dmb->size ) { + if ( dmb->buf ) { + tmp = realloc(dmb->buf, qlen); if ( tmp ) - buf->qemubuf = tmp; + dmb->buf = tmp; else { - ERROR("Error reallocating QEMU state buffer"); + ERROR("Error reallocating Device Model %u state buffer", dmid); return -1; } } else { - buf->qemubuf = malloc(qlen); - if ( !buf->qemubuf ) { - ERROR("Error allocating QEMU state buffer"); + dmb->buf = malloc(qlen); + if ( !dmb->buf ) { + ERROR("Error allocating Device Model %u state buffer", dmid); return -1; } } } - buf->qemubufsize = qlen; + dmb->size = qlen; - if ( RDEXACT(fd, buf->qemubuf, buf->qemubufsize) ) { - PERROR("Error reading QEMU state"); + if ( RDEXACT(fd, dmb->buf, dmb->size) ) { + PERROR("Error reading Device Model %u state", dmid); return -1; } return 0; } -static int dump_qemu(xc_interface *xch, uint32_t dom, struct tailbuf_hvm *buf) +static int buffer_device_models(xc_interface *xch, struct restore_ctx *ctx, + int fd, struct tailbuf_hvm *buf) +{ + uint32_t i, num_dms; + unsigned char qemusig[QEMUSIG_SIZE + 1]; + int ret = 0; + + if ( RDEXACT(fd, &num_dms, sizeof(num_dms)) ) { + PERROR("Error reading num dms"); + return -1; + } + + if ( !(buf->dmsbuf = calloc(num_dms, sizeof (*buf->dmsbuf))) ) { + PERROR("Error allocating Device Model buffers"); + return -1; + } + + buf->num_dms = num_dms; + + for ( i = 0; i < num_dms; i++ ) { + if ( RDEXACT(fd, qemusig, QEMUSIG_SIZE) ) { + PERROR("Error reading Device Model %u signature", i); + return -1; + } + + if ( memcmp(qemusig, "DeviceModelRecord0002", QEMUSIG_SIZE) ) { + qemusig[QEMUSIG_SIZE] = ''\0''; + ERROR("Invalid Device Model %u signature: %s", i, qemusig); + return -1; + } + + ret = buffer_qemu(xch, ctx, i, fd, buf); + if ( ret ) + return ret; + } + + return 0; +} + +static int dump_qemu(xc_interface *xch, uint32_t dom, + uint32_t dmid, struct tailbuf_hvm *buf) { int saved_errno; char path[256]; FILE *fp; + struct devmodel_buffer *dmb = &buf->dmsbuf[dmid]; - sprintf(path, XC_DEVICE_MODEL_RESTORE_FILE".%u", dom); + sprintf(path, XC_DEVICE_MODEL_RESTORE_FILE".%u.%u", dom, dmid); fp = fopen(path, "wb"); if ( !fp ) return -1; - DPRINTF("Writing %d bytes of QEMU data\n", buf->qemubufsize); - if ( fwrite(buf->qemubuf, 1, buf->qemubufsize, fp) != buf->qemubufsize) { + DPRINTF("Writing %d bytes of Device Model %u data\n", dmb->size, dmid); + if ( fwrite(dmb->buf, 1, dmb->size, fp) != dmb->size ) { saved_errno = errno; fclose(fp); errno = saved_errno; @@ -467,7 +522,7 @@ static int buffer_tail_hvm(xc_interface *xch, struct restore_ctx *ctx, int vcpuextstate, uint32_t vcpuextstate_size) { uint8_t *tmp; - unsigned char qemusig[21]; + unsigned char qemusig[QEMUSIG_SIZE + 1]; if ( RDEXACT(fd, buf->magicpfns, sizeof(buf->magicpfns)) ) { PERROR("Error reading magic PFNs"); @@ -504,7 +559,7 @@ static int buffer_tail_hvm(xc_interface *xch, struct restore_ctx *ctx, return -1; } - if ( RDEXACT(fd, qemusig, sizeof(qemusig)) ) { + if ( RDEXACT(fd, qemusig, QEMUSIG_SIZE) ) { PERROR("Error reading QEMU signature"); return -1; } @@ -517,13 +572,22 @@ static int buffer_tail_hvm(xc_interface *xch, struct restore_ctx *ctx, * live-migration QEMU record and Remus which includes a length * prefix */ - if ( !memcmp(qemusig, "QemuDeviceModelRecord", sizeof(qemusig)) ) + if ( !memcmp(qemusig, "QemuDeviceModelRecord", QEMUSIG_SIZE) ) return compat_buffer_qemu(xch, ctx, fd, buf); - else if ( !memcmp(qemusig, "DeviceModelRecord0002", sizeof(qemusig)) || - !memcmp(qemusig, "RemusDeviceModelState", sizeof(qemusig)) ) - return buffer_qemu(xch, ctx, fd, buf); + else if ( !memcmp(qemusig, "DeviceModelRecord0002", QEMUSIG_SIZE) || + !memcmp(qemusig, "RemusDeviceModelState", QEMUSIG_SIZE) ) + { + if ( !(buf->dmsbuf = calloc(1, sizeof (*buf->dmsbuf))) ) { + PERROR("Error allocating Device Model buffer"); + return -1; + } + return buffer_qemu(xch, ctx, 0, fd, buf); + } + else if ( !memcmp(qemusig, "DeviceModelRecords001", QEMUSIG_SIZE) ) { + return buffer_device_models(xch, ctx, fd, buf); + } - qemusig[20] = ''\0''; + qemusig[QEMUSIG_SIZE] = ''\0''; ERROR("Invalid QEMU signature: %s", qemusig); return -1; } @@ -629,13 +693,18 @@ static int buffer_tail(xc_interface *xch, struct restore_ctx *ctx, static void tailbuf_free_hvm(struct tailbuf_hvm *buf) { + uint32_t i; + if ( buf->hvmbuf ) { free(buf->hvmbuf); buf->hvmbuf = NULL; } - if ( buf->qemubuf ) { - free(buf->qemubuf); - buf->qemubuf = NULL; + + for (i = 0; i < buf->num_dms; i++) + { + if (buf->dmsbuf[i].buf) + free(buf->dmsbuf[i].buf); + buf->dmsbuf[i].buf = NULL; } } @@ -2137,10 +2206,17 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, } } - /* Dump the QEMU state to a state file for QEMU to load */ - if ( dump_qemu(xch, dom, &tailbuf.u.hvm) ) { - PERROR("Error dumping QEMU state to file"); - goto out; + /** + * Dump the each Device Model state to a state file for the Device + * Model to load + */ + for ( i = 0; i < tailbuf.u.hvm.num_dms; i++) + { + if ( dump_qemu(xch, dom, i, &tailbuf.u.hvm) ) + { + PERROR("Error dumping Device Model %u state to file", i); + goto out; + } } /* These comms pages need to be zeroed at the start of day */ @@ -2153,9 +2229,9 @@ int xc_domain_restore(xc_interface *xch, int io_fd, uint32_t dom, } if ( (frc = xc_set_hvm_param(xch, dom, - HVM_PARAM_IOREQ_PFN, tailbuf.u.hvm.magicpfns[0])) + HVM_PARAM_IO_PFN_FIRST, tailbuf.u.hvm.magicpfns[0])) || (frc = xc_set_hvm_param(xch, dom, - HVM_PARAM_BUFIOREQ_PFN, tailbuf.u.hvm.magicpfns[1])) + HVM_PARAM_IO_PFN_LAST, tailbuf.u.hvm.magicpfns[1])) || (frc = xc_set_hvm_param(xch, dom, HVM_PARAM_STORE_PFN, tailbuf.u.hvm.magicpfns[2])) || (frc = xc_set_hvm_param(xch, dom, diff --git a/tools/libxc/xc_domain_save.c b/tools/libxc/xc_domain_save.c index c359649..2aa7a28 100644 --- a/tools/libxc/xc_domain_save.c +++ b/tools/libxc/xc_domain_save.c @@ -862,7 +862,7 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter uint8_t *hvm_buf = NULL; /* HVM: magic frames for ioreqs and xenstore comms. */ - uint64_t magic_pfns[3]; /* ioreq_pfn, bufioreq_pfn, store_pfn */ + uint64_t magic_pfns[3]; /* io_pfn_first , io_pfn_last, store_pfn */ unsigned long mfn; @@ -1787,9 +1787,9 @@ int xc_domain_save(xc_interface *xch, int io_fd, uint32_t dom, uint32_t max_iter /* Save magic-page locations. */ memset(magic_pfns, 0, sizeof(magic_pfns)); - xc_get_hvm_param(xch, dom, HVM_PARAM_IOREQ_PFN, + xc_get_hvm_param(xch, dom, HVM_PARAM_IO_PFN_FIRST, (unsigned long *)&magic_pfns[0]); - xc_get_hvm_param(xch, dom, HVM_PARAM_BUFIOREQ_PFN, + xc_get_hvm_param(xch, dom, HVM_PARAM_IO_PFN_LAST, (unsigned long *)&magic_pfns[1]); xc_get_hvm_param(xch, dom, HVM_PARAM_STORE_PFN, (unsigned long *)&magic_pfns[2]); -- Julien Grall
Julien Grall
2012-Aug-22 12:31 UTC
[XEN][RFC PATCH V2 12/17] xl: Add interface to handle qemu disaggregation
This patch modifies libxl interface for qemu disaggregation. For the moment, due to some dependencies between devices, we can''t let the user choose which QEMU emulate a device. Moreoever this patch adds an "id" field to nic interface. It will be used in config file to specify which QEMU handle the network card. A possible disaggregation is: - UI: Emulate graphic card, USB, keyboard, mouse, default devices (PIIX4, root bridge, ...) - IDE: Emulate disk - Serial: Emulate serial port - Audio: Emulate audio card - Net: Emulate one or more network cards, multiple QEMU can emulate different card. The emulated card is specified with its nic ID. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- tools/libxl/libxl.h | 3 +++ tools/libxl/libxl_types.idl | 15 +++++++++++++++ 2 files changed, 18 insertions(+), 0 deletions(-) diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index c614d6f..71d4808 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -307,6 +307,7 @@ void libxl_cpuid_dispose(libxl_cpuid_policy_list *cpuid_list); #define LIBXL_PCI_FUNC_ALL (~0U) typedef uint32_t libxl_domid; +typedef uint32_t libxl_dmid; /* * Formatting Enumerations. @@ -478,12 +479,14 @@ typedef struct { libxl_domain_build_info b_info; int num_disks, num_nics, num_pcidevs, num_vfbs, num_vkbs; + int num_dms; libxl_device_disk *disks; libxl_device_nic *nics; libxl_device_pci *pcidevs; libxl_device_vfb *vfbs; libxl_device_vkb *vkbs; + libxl_dm *dms; libxl_action_on_shutdown on_poweroff; libxl_action_on_shutdown on_reboot; diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index daa8c79..36c802a 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -246,6 +246,20 @@ libxl_domain_sched_params = Struct("domain_sched_params",[ ("extratime", integer, {''init_val'': ''LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT''}), ]) +libxl_dm_cap = Enumeration("dm_cap", [ + (1, "UI"), # Emulate all UI + default device + (2, "IDE"), # Emulate IDE + (4, "SERIAL"), # Emulate Serial + (8, "AUDIO"), # Emulate audio + ]) + +libxl_dm = Struct("dm", [ + ("name", string), + ("path", string), + ("capabilities", uint64), + ("vifs", libxl_string_list), + ]) + libxl_domain_build_info = Struct("domain_build_info",[ ("max_vcpus", integer), ("avail_vcpus", libxl_bitmap), @@ -367,6 +381,7 @@ libxl_device_nic = Struct("device_nic", [ ("nictype", libxl_nic_type), ("rate_bytes_per_interval", uint64), ("rate_interval_usecs", uint32), + ("id", string), ]) libxl_device_pci = Struct("device_pci", [ -- Julien Grall
Julien Grall
2012-Aug-22 12:31 UTC
[XEN][RFC PATCH V2 13/17] xl: add device model id to qmp functions
With the support of multiple device model, the qmp library need to know which device models is currently used. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- tools/libxl/libxl_internal.h | 24 ++++++++++++------- tools/libxl/libxl_qmp.c | 49 ++++++++++++++++++++++++------------------ 2 files changed, 43 insertions(+), 30 deletions(-) diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 7c3b179..71e4970 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -1384,26 +1384,32 @@ typedef struct libxl__qmp_handler libxl__qmp_handler; * Return an handler or NULL if there is an error */ _hidden libxl__qmp_handler *libxl__qmp_initialize(libxl__gc *gc, - uint32_t domid); + libxl_domid domid, + libxl_dmid dmid); /* ask to QEMU the serial port information and store it in xenstore. */ _hidden int libxl__qmp_query_serial(libxl__qmp_handler *qmp); -_hidden int libxl__qmp_pci_add(libxl__gc *gc, int d, libxl_device_pci *pcidev); -_hidden int libxl__qmp_pci_del(libxl__gc *gc, int domid, - libxl_device_pci *pcidev); +_hidden int libxl__qmp_pci_add(libxl__gc *gc, libxl_domid d, + libxl_dmid dmid, libxl_device_pci *pcidev); +_hidden int libxl__qmp_pci_del(libxl__gc *gc, libxl_domid domid, + libxl_dmid dmid, libxl_device_pci *pcidev); /* Suspend QEMU. */ -_hidden int libxl__qmp_stop(libxl__gc *gc, int domid); +_hidden int libxl__qmp_stop(libxl__gc *gc, libxl_domid domid, libxl_dmid dmid); /* Resume QEMU. */ -_hidden int libxl__qmp_resume(libxl__gc *gc, int domid); +_hidden int libxl__qmp_resume(libxl__gc *gc, libxl_domid domid, + libxl_dmid dmid); /* Save current QEMU state into fd. */ -_hidden int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename); +_hidden int libxl__qmp_save(libxl__gc *gc, libxl_domid domid, + libxl_dmid dmid, const char *filename); /* close and free the QMP handler */ _hidden void libxl__qmp_close(libxl__qmp_handler *qmp); /* remove the socket file, if the file has already been removed, * nothing happen */ -_hidden void libxl__qmp_cleanup(libxl__gc *gc, uint32_t domid); +_hidden void libxl__qmp_cleanup(libxl__gc *gc, libxl_domid domid, + libxl_dmid dmid); /* this helper calls qmp_initialize, query_serial and qmp_close */ -_hidden int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid, +_hidden int libxl__qmp_initializations(libxl__gc *gc, libxl_domid domid, + libxl_dmid dmid, const libxl_domain_config *guest_config); /* on failure, logs */ diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c index e33b130..3c3cccf 100644 --- a/tools/libxl/libxl_qmp.c +++ b/tools/libxl/libxl_qmp.c @@ -627,7 +627,8 @@ static void qmp_free_handler(libxl__qmp_handler *qmp) * API */ -libxl__qmp_handler *libxl__qmp_initialize(libxl__gc *gc, uint32_t domid) +libxl__qmp_handler *libxl__qmp_initialize(libxl__gc *gc, libxl_domid domid, + libxl_dmid dmid) { int ret = 0; libxl__qmp_handler *qmp = NULL; @@ -635,8 +636,8 @@ libxl__qmp_handler *libxl__qmp_initialize(libxl__gc *gc, uint32_t domid) qmp = qmp_init_handler(gc, domid); - qmp_socket = libxl__sprintf(gc, "%s/qmp-libxl-%d", - libxl__run_dir_path(), domid); + qmp_socket = libxl__sprintf(gc, "%s/qmp-libxl-%u-%u", + libxl__run_dir_path(), domid, dmid); if ((ret = qmp_open(qmp, qmp_socket, QMP_SOCKET_CONNECT_TIMEOUT)) < 0) { LIBXL__LOG_ERRNO(qmp->ctx, LIBXL__LOG_ERROR, "Connection error"); qmp_free_handler(qmp); @@ -668,13 +669,13 @@ void libxl__qmp_close(libxl__qmp_handler *qmp) qmp_free_handler(qmp); } -void libxl__qmp_cleanup(libxl__gc *gc, uint32_t domid) +void libxl__qmp_cleanup(libxl__gc *gc, libxl_domid domid, libxl_dmid dmid) { libxl_ctx *ctx = libxl__gc_owner(gc); char *qmp_socket; - qmp_socket = libxl__sprintf(gc, "%s/qmp-libxl-%d", - libxl__run_dir_path(), domid); + qmp_socket = libxl__sprintf(gc, "%s/qmp-libxl-%u-%u", + libxl__run_dir_path(), domid, dmid); if (unlink(qmp_socket) == -1) { if (errno != ENOENT) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, @@ -746,7 +747,9 @@ out: return rc; } -int libxl__qmp_pci_add(libxl__gc *gc, int domid, libxl_device_pci *pcidev) +int libxl__qmp_pci_add(libxl__gc *gc, libxl_domid domid, + libxl_dmid dmid, + libxl_device_pci *pcidev) { libxl__qmp_handler *qmp = NULL; flexarray_t *parameters = NULL; @@ -754,7 +757,7 @@ int libxl__qmp_pci_add(libxl__gc *gc, int domid, libxl_device_pci *pcidev) char *hostaddr = NULL; int rc = 0; - qmp = libxl__qmp_initialize(gc, domid); + qmp = libxl__qmp_initialize(gc, domid, 0); if (!qmp) return -1; @@ -792,14 +795,15 @@ int libxl__qmp_pci_add(libxl__gc *gc, int domid, libxl_device_pci *pcidev) return rc; } -static int qmp_device_del(libxl__gc *gc, int domid, char *id) +static int qmp_device_del(libxl__gc *gc, libxl_domid domid, + libxl_dmid dmid, char *id) { libxl__qmp_handler *qmp = NULL; flexarray_t *parameters = NULL; libxl_key_value_list args = NULL; int rc = 0; - qmp = libxl__qmp_initialize(gc, domid); + qmp = libxl__qmp_initialize(gc, domid, 0); if (!qmp) return ERROR_FAIL; @@ -817,24 +821,26 @@ static int qmp_device_del(libxl__gc *gc, int domid, char *id) return rc; } -int libxl__qmp_pci_del(libxl__gc *gc, int domid, libxl_device_pci *pcidev) +int libxl__qmp_pci_del(libxl__gc *gc, libxl_domid domid, + libxl_domid dmid, libxl_device_pci *pcidev) { char *id = NULL; id = libxl__sprintf(gc, PCI_PT_QDEV_ID, pcidev->bus, pcidev->dev, pcidev->func); - return qmp_device_del(gc, domid, id); + return qmp_device_del(gc, domid, dmid, id); } -int libxl__qmp_save(libxl__gc *gc, int domid, const char *filename) +int libxl__qmp_save(libxl__gc *gc, libxl_domid domid, + libxl_dmid dmid, const char *filename) { libxl__qmp_handler *qmp = NULL; flexarray_t *parameters = NULL; libxl_key_value_list args = NULL; int rc = 0; - qmp = libxl__qmp_initialize(gc, domid); + qmp = libxl__qmp_initialize(gc, domid, dmid); if (!qmp) return ERROR_FAIL; @@ -883,12 +889,12 @@ static int qmp_change(libxl__gc *gc, libxl__qmp_handler *qmp, return rc; } -int libxl__qmp_stop(libxl__gc *gc, int domid) +int libxl__qmp_stop(libxl__gc *gc, libxl_domid domid, libxl_dmid dmid) { libxl__qmp_handler *qmp = NULL; int rc = 0; - qmp = libxl__qmp_initialize(gc, domid); + qmp = libxl__qmp_initialize(gc, domid, dmid); if (!qmp) return ERROR_FAIL; @@ -899,12 +905,12 @@ int libxl__qmp_stop(libxl__gc *gc, int domid) return rc; } -int libxl__qmp_resume(libxl__gc *gc, int domid) +int libxl__qmp_resume(libxl__gc *gc, libxl_domid domid, libxl_dmid dmid) { libxl__qmp_handler *qmp = NULL; int rc = 0; - qmp = libxl__qmp_initialize(gc, domid); + qmp = libxl__qmp_initialize(gc, domid, dmid); if (!qmp) return ERROR_FAIL; @@ -915,14 +921,15 @@ int libxl__qmp_resume(libxl__gc *gc, int domid) return rc; } -int libxl__qmp_initializations(libxl__gc *gc, uint32_t domid, +int libxl__qmp_initializations(libxl__gc *gc, libxl_domid domid, + libxl_dmid dmid, const libxl_domain_config *guest_config) { - const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config); + const libxl_vnc_info *vnc = libxl__dm_vnc(dmid, guest_config); libxl__qmp_handler *qmp = NULL; int ret = 0; - qmp = libxl__qmp_initialize(gc, domid); + qmp = libxl__qmp_initialize(gc, domid, dmid); if (!qmp) return -1; ret = libxl__qmp_query_serial(qmp); -- Julien Grall
Julien Grall
2012-Aug-22 12:32 UTC
[XEN][RFC PATCH V2 14/17] xl-parsing: Parse new device_models option
Add new option "device_models". The user can specify the capability of the QEMU (ui, vifs, ...). This option only works with QEMU upstream (qemu-xen). For instance: device_models= [ ''name=all,vifs=nic1'', ''name=qvga,ui'', ''name=qide,ide'' ] Each device model can also take a path argument which override the default one. It''s usefull for debugging. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- tools/libxl/Makefile | 2 +- tools/libxl/libxlu_dm.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++ tools/libxl/libxlutil.h | 5 ++ tools/libxl/xl_cmdimpl.c | 29 +++++++++++++- 4 files changed, 130 insertions(+), 2 deletions(-) create mode 100644 tools/libxl/libxlu_dm.c diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 47fb110..2b58721 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -79,7 +79,7 @@ AUTOINCS= libxlu_cfg_y.h libxlu_cfg_l.h _libxl_list.h _paths.h \ AUTOSRCS= libxlu_cfg_y.c libxlu_cfg_l.c AUTOSRCS += _libxl_save_msgs_callout.c _libxl_save_msgs_helper.c LIBXLU_OBJS = libxlu_cfg_y.o libxlu_cfg_l.o libxlu_cfg.o \ - libxlu_disk_l.o libxlu_disk.o libxlu_vif.o libxlu_pci.o + libxlu_disk_l.o libxlu_disk.o libxlu_vif.o libxlu_pci.o libxlu_dm.o $(LIBXLU_OBJS): CFLAGS += $(CFLAGS_libxenctrl) # For xentoollog.h CLIENTS = xl testidl libxl-save-helper diff --git a/tools/libxl/libxlu_dm.c b/tools/libxl/libxlu_dm.c new file mode 100644 index 0000000..9f0a347 --- /dev/null +++ b/tools/libxl/libxlu_dm.c @@ -0,0 +1,96 @@ +#include "libxl_osdeps.h" /* must come before any other headers */ +#include <stdlib.h> +#include "libxlu_internal.h" +#include "libxlu_cfg_i.h" + +static void split_string_into_string_list(const char *str, + const char *delim, + libxl_string_list *psl) +{ + char *s, *saveptr; + const char *p; + libxl_string_list sl; + + int i = 0, nr = 0; + + s = strdup(str); + if (s == NULL) { + fprintf(stderr, "xlu_dm: unable to allocate memory\n"); + exit(-1); + } + + /* Count number of entries */ + p = strtok_r(s, delim, &saveptr); + do { + nr++; + } while ((p = strtok_r(NULL, delim, &saveptr))); + + free(s); + + s = strdup(str); + + sl = malloc((nr+1) * sizeof (char *)); + if (sl == NULL) { + fprintf(stderr, "xlu_dm: unable to allocate memory\n"); + exit(-1); + } + + p = strtok_r(s, delim, &saveptr); + do { + assert(i < nr); + // Skip blank + while (*p == '' '') + p++; + sl[i] = strdup(p); + i++; + } while ((p = strtok_r(NULL, delim, &saveptr))); + sl[i] = NULL; + + *psl = sl; + + free(s); +} + +int xlu_dm_parse(XLU_Config *cfg, const char *spec, + libxl_dm *dm) +{ + char *buf = strdup(spec); + char *p, *p2; + int rc = 0; + + p = strtok(buf, ","); + if (!p) + goto skip_dm; + do { + while (*p == '' '') + p++; + if ((p2 = strchr(p, ''='')) == NULL) { + if (!strcmp(p, "ui")) + dm->capabilities |= LIBXL_DM_CAP_UI; + else if (!strcmp(p, "ide")) + dm->capabilities |= LIBXL_DM_CAP_IDE; + else if (!strcmp(p, "serial")) + dm->capabilities |= LIBXL_DM_CAP_SERIAL; + else if (!strcmp(p, "audio")) + dm->capabilities |= LIBXL_DM_CAP_AUDIO; + } else { + *p2 = ''\0''; + if (!strcmp(p, "name")) + dm->name = strdup(p2 + 1); + else if (!strcmp(p, "path")) + dm->path = strdup(p2 + 1); + else if (!strcmp(p, "vifs")) + split_string_into_string_list(p2 + 1, ";", &dm->vifs); + } + } while ((p = strtok(NULL, ",")) != NULL); + + if (!dm->name && dm->path) + { + fprintf(stderr, "xl: Unable to parse device_deamon\n"); + exit(-ERROR_FAIL); + } +skip_dm: + free(buf); + + return rc; +} diff --git a/tools/libxl/libxlutil.h b/tools/libxl/libxlutil.h index 0333e55..db22715 100644 --- a/tools/libxl/libxlutil.h +++ b/tools/libxl/libxlutil.h @@ -93,6 +93,11 @@ int xlu_disk_parse(XLU_Config *cfg, int nspecs, const char *const *specs, */ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str); +/* + * Daemon specification parsing. + */ +int xlu_dm_parse(XLU_Config *cfg, const char *spec, + libxl_dm *dm); /* * Vif rate parsing. diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 138cd72..2a26fa4 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -561,7 +561,7 @@ static void parse_config_data(const char *config_source, const char *buf; long l; XLU_Config *config; - XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids; + XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *dms; int pci_power_mgmt = 0; int pci_msitranslate = 0; int pci_permissive = 0; @@ -995,6 +995,9 @@ static void parse_config_data(const char *config_source, } else if (!strcmp(p, "vifname")) { free(nic->ifname); nic->ifname = strdup(p2 + 1); + } else if (!strcmp(p, "id")) { + free(nic->id); + nic->id = strdup(p2 + 1); } else if (!strcmp(p, "backend")) { if(libxl_name_to_domid(ctx, (p2 + 1), &(nic->backend_domid))) { fprintf(stderr, "Specified backend domain does not exist, defaulting to Dom0\n"); @@ -1249,6 +1252,30 @@ skip_vfb: } } } + + d_config->num_dms = 0; + d_config->dms = NULL; + + if (b_info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN + && !xlu_cfg_get_list (config, "device_models", &dms, 0, 0)) { + while ((buf = xlu_cfg_get_listitem (dms, d_config->num_dms)) != NULL) { + libxl_dm *dm; + size_t size = sizeof (libxl_dm) * (d_config->num_dms + 1); + + d_config->dms = (libxl_dm *)realloc (d_config->dms, size); + if (!d_config->dms) { + fprintf(stderr, "Can''t realloc d_config->dms\n"); + exit (1); + } + dm = d_config->dms + d_config->num_dms; + libxl_dm_init (dm); + if (xlu_dm_parse(config, buf, dm)) { + exit (-ERROR_FAIL); + } + d_config->num_dms++; + } + } + #define parse_extra_args(type) \ e = xlu_cfg_get_list_as_string_list(config, "device_model_args"#type, \ &b_info->extra##type, 0); \ -- Julien Grall
Julien Grall
2012-Aug-22 12:32 UTC
[XEN][RFC PATCH V2 15/17] xl: support spawn/destroy on multiple device model
Old configuration file is still working with qemu disaggregation. Before to spawn any QEMU, the toolstack will fill correctly, if needed, configuration structure. For the moment, the toolstack spawns device models one by one. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- tools/libxl/libxl.c | 16 ++- tools/libxl/libxl_create.c | 150 +++++++++++++----- tools/libxl/libxl_device.c | 7 +- tools/libxl/libxl_dm.c | 369 ++++++++++++++++++++++++++++++------------ tools/libxl/libxl_dom.c | 4 +- tools/libxl/libxl_internal.h | 36 +++-- 6 files changed, 421 insertions(+), 161 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 8ea3478..60718b6 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -1330,7 +1330,8 @@ static void stubdom_destroy_callback(libxl__egc *egc, } dds->stubdom_finished = 1; - savefile = libxl__device_model_savefile(gc, dis->domid); + /* FIXME: get dmid */ + savefile = libxl__device_model_savefile(gc, dis->domid, 0); rc = libxl__remove_file(gc, savefile); /* * On suspend libxl__domain_save_device_model will have already @@ -1423,10 +1424,8 @@ void libxl__destroy_domid(libxl__egc *egc, libxl__destroy_domid_state *dis) LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_pause failed for %d", domid); } if (dm_present) { - if (libxl__destroy_device_model(gc, domid) < 0) - LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl__destroy_device_model failed for %d", domid); - - libxl__qmp_cleanup(gc, domid); + if (libxl__destroy_device_models(gc, domid) < 0) + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl__destroy_device_models failed for %d", domid); } dis->drs.ao = ao; dis->drs.domid = domid; @@ -1725,6 +1724,13 @@ out: /******************************************************************************/ +int libxl__dm_setdefault(libxl__gc *gc, libxl_dm *dm) +{ + return 0; +} + +/******************************************************************************/ + int libxl__device_disk_setdefault(libxl__gc *gc, libxl_device_disk *disk) { int rc; diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 5f0d26f..7160c78 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -35,6 +35,10 @@ void libxl_domain_config_dispose(libxl_domain_config *d_config) { int i; + for (i=0; i<d_config->num_dms; i++) + libxl_dm_dispose(&d_config->dms[i]); + free(d_config->dms); + for (i=0; i<d_config->num_disks; i++) libxl_device_disk_dispose(&d_config->disks[i]); free(d_config->disks); @@ -59,6 +63,50 @@ void libxl_domain_config_dispose(libxl_domain_config *d_config) libxl_domain_build_info_dispose(&d_config->b_info); } +static int libxl__domain_config_setdefault(libxl__gc *gc, + libxl_domain_config *d_config) +{ + libxl_domain_build_info *b_info = &d_config->b_info; + uint64_t cap = 0; + int i = 0; + int ret = 0; + libxl_dm *default_dm = NULL; + + if (b_info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL + && (d_config->num_dms > 1)) + return ERROR_INVAL; + + if (!d_config->num_dms) { + d_config->dms = malloc(sizeof (*d_config->dms)); + if (!d_config->dms) + return ERROR_NOMEM; + libxl_dm_init(d_config->dms); + d_config->num_dms = 1; + } + + for (i = 0; i < d_config->num_dms; i++) + { + ret = libxl__dm_setdefault(gc, &d_config->dms[i]); + if (ret) return ret; + + if (cap & d_config->dms[i].capabilities) + /* Some capabilities are already emulated */ + return ERROR_INVAL; + + cap |= d_config->dms[i].capabilities; + if (d_config->dms[i].capabilities & LIBXL_DM_CAP_UI) + default_dm = &d_config->dms[i]; + } + + if (!default_dm) + default_dm = &d_config->dms[0]; + + /* The default device model emulates all that the others don''t emulate */ + default_dm->capabilities |= ~cap; + + return ret; +} + int libxl__domain_create_info_setdefault(libxl__gc *gc, libxl_domain_create_info *c_info) { @@ -145,11 +193,11 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL; else { const char *dm; - int rc; + int rc = 0; b_info->device_model_version LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN; - dm = libxl__domain_device_model(gc, b_info); + dm = libxl__domain_device_model(gc, ~0, b_info); rc = access(dm, X_OK); if (rc < 0) { /* qemu-xen unavailable, use qemu-xen-traditional */ @@ -651,11 +699,13 @@ static void initiate_domain_create(libxl__egc *egc, } dcs->guest_domid = domid; - dcs->dmss.dm.guest_domid = 0; /* means we haven''t spawned */ ret = libxl__domain_build_info_setdefault(gc, &d_config->b_info); if (ret) goto error_out; + ret = libxl__domain_config_setdefault(gc, d_config); + if (ret) goto error_out; + if (!sched_params_valid(gc, domid, &d_config->b_info.sched_params)) { LOG(ERROR, "Invalid scheduling parameters\n"); ret = ERROR_INVAL; @@ -667,6 +717,15 @@ static void initiate_domain_create(libxl__egc *egc, if (ret) goto error_out; } + dcs->current_dmid = 0; + dcs->build_state.num_dms = d_config->num_dms; + GCNEW_ARRAY(dcs->dmss, d_config->num_dms); + + for (i = 0; i < d_config->num_dms; i++) { + dcs->dmss[i].dm.guest_domid = 0; /* Means we haven''t spawned */ + dcs->dmss[i].dm.dcs = dcs; + } + dcs->bl.ao = ao; libxl_device_disk *bootdisk d_config->num_disks > 0 ? &d_config->disks[0] : NULL; @@ -709,6 +768,26 @@ static void domcreate_console_available(libxl__egc *egc, dcs->guest_domid)); } +static void domcreate_spawn_devmodel(libxl__egc *egc, + libxl__domain_create_state *dcs, + libxl_dmid dmid) +{ + libxl__stub_dm_spawn_state *dmss = &dcs->dmss[dmid]; + STATE_AO_GC(dcs->ao); + + /* We might be going to call libxl__spawn_local_dm, or _spawn_stub_dm. + * Fill in any field required by either, including both relevant + * callbacks (_spawn_stub_dm will overwrite our trespass if needed). */ + dmss->dm.spawn.ao = ao; + dmss->dm.guest_config = dcs->guest_config; + dmss->dm.build_state = &dcs->build_state; + dmss->dm.callback = domcreate_devmodel_started; + dmss->callback = domcreate_devmodel_started; + dmss->dm.dmid = dmid; + + libxl__spawn_dm(egc, dmss); +} + static void domcreate_bootloader_done(libxl__egc *egc, libxl__bootloader_state *bl, int rc) @@ -735,15 +814,6 @@ static void domcreate_bootloader_done(libxl__egc *egc, */ state->pv_cmdline = bl->cmdline; - /* We might be going to call libxl__spawn_local_dm, or _spawn_stub_dm. - * Fill in any field required by either, including both relevant - * callbacks (_spawn_stub_dm will overwrite our trespass if needed). */ - dcs->dmss.dm.spawn.ao = ao; - dcs->dmss.dm.guest_config = dcs->guest_config; - dcs->dmss.dm.build_state = &dcs->build_state; - dcs->dmss.dm.callback = domcreate_devmodel_started; - dcs->dmss.callback = domcreate_devmodel_started; - if ( restore_fd < 0 ) { rc = libxl__domain_build(gc, &d_config->b_info, domid, state); domcreate_rebuild_done(egc, dcs, rc); @@ -962,11 +1032,7 @@ static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev, libxl__device_vkb_add(gc, domid, &vkb); libxl_device_vkb_dispose(&vkb); - dcs->dmss.dm.guest_domid = domid; - if (libxl_defbool_val(d_config->b_info.device_model_stubdomain)) - libxl__spawn_stub_dm(egc, &dcs->dmss); - else - libxl__spawn_local_dm(egc, &dcs->dmss.dm); + domcreate_spawn_devmodel(egc, dcs, dcs->current_dmid); return; } case LIBXL_DOMAIN_TYPE_PV: @@ -991,12 +1057,11 @@ static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev, libxl__device_console_dispose(&console); if (need_qemu) { - dcs->dmss.dm.guest_domid = domid; - libxl__spawn_local_dm(egc, &dcs->dmss.dm); + assert(dcs->dmss); + domcreate_spawn_devmodel(egc, dcs, dcs->current_dmid); return; } else { - assert(!dcs->dmss.dm.guest_domid); - domcreate_devmodel_started(egc, &dcs->dmss.dm, 0); + assert(!dcs->dmss); return; } } @@ -1015,7 +1080,7 @@ static void domcreate_devmodel_started(libxl__egc *egc, libxl__dm_spawn_state *dmss, int ret) { - libxl__domain_create_state *dcs = CONTAINER_OF(dmss, *dcs, dmss.dm); + libxl__domain_create_state *dcs = dmss->dcs; STATE_AO_GC(dmss->spawn.ao); libxl_ctx *ctx = CTX; int domid = dcs->guest_domid; @@ -1029,15 +1094,15 @@ static void domcreate_devmodel_started(libxl__egc *egc, goto error_out; } - if (dcs->dmss.dm.guest_domid) { + if (dmss->guest_domid) { if (d_config->b_info.device_model_version == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN) { - libxl__qmp_initializations(gc, domid, d_config); + libxl__qmp_initializations(gc, domid, dmss->dmid, d_config); } } /* Plug nic interfaces */ - if (d_config->num_nics > 0) { + if (d_config->num_nics > 0 && dmss->dmid == 0) { /* Attach nics */ libxl__multidev_begin(ao, &dcs->multidev); dcs->multidev.callback = domcreate_attach_pci; @@ -1071,23 +1136,34 @@ static void domcreate_attach_pci(libxl__egc *egc, libxl__multidev *multidev, goto error_out; } - for (i = 0; i < d_config->num_pcidevs; i++) - libxl__device_pci_add(gc, domid, &d_config->pcidevs[i], 1); + /* TO FIX: for the moment only add to device model 0 */ - if (d_config->num_pcidevs > 0) { - ret = libxl__create_pci_backend(gc, domid, d_config->pcidevs, - d_config->num_pcidevs); - if (ret < 0) { - LIBXL__LOG(ctx, LIBXL__LOG_ERROR, - "libxl_create_pci_backend failed: %d", ret); - goto error_out; + if (dcs->current_dmid == 0) { + for (i = 0; i < d_config->num_pcidevs; i++) + libxl__device_pci_add(gc, domid, + &d_config->pcidevs[i], 1); + + if (d_config->num_pcidevs > 0) { + ret = libxl__create_pci_backend(gc, domid, d_config->pcidevs, + d_config->num_pcidevs); + if (ret < 0) { + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, + "libxl_create_pci_backend failed: %d", ret); + goto error_out; + } } } - libxl__arch_domain_create(gc, d_config, domid); - domcreate_console_available(egc, dcs); + dcs->current_dmid++; + + if (dcs->current_dmid >= dcs->guest_config->num_dms) { + libxl__arch_domain_create(gc, d_config, domid); + domcreate_console_available(egc, dcs); + domcreate_complete(egc, dcs, 0); + } else { + domcreate_spawn_devmodel(egc, dcs, dcs->current_dmid); + } - domcreate_complete(egc, dcs, 0); return; error_out: diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c index 8e8410e..798665e 100644 --- a/tools/libxl/libxl_device.c +++ b/tools/libxl/libxl_device.c @@ -1034,8 +1034,8 @@ static void devices_remove_callback(libxl__egc *egc, return; } -int libxl__wait_for_device_model(libxl__gc *gc, - uint32_t domid, char *state, +int libxl__wait_for_device_model(libxl__gc *gc, libxl_domid domid, + libxl_dmid dmid, char *state, libxl__spawn_starting *spawning, int (*check_callback)(libxl__gc *gc, uint32_t domid, @@ -1044,7 +1044,8 @@ int libxl__wait_for_device_model(libxl__gc *gc, void *check_callback_userdata) { char *path; - path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state", domid); + path = libxl__sprintf(gc, "/local/domain/0/dms/%u/%u/state", + domid, dmid); return libxl__wait_for_offspring(gc, domid, LIBXL_DEVICE_MODEL_START_TIMEOUT, "Device Model", path, state, spawning, diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 0c0084f..de7138f 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -28,24 +28,30 @@ static const char *libxl_tapif_script(libxl__gc *gc) #endif } -const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid) +const char *libxl__device_model_savefile(libxl__gc *gc, libxl_domid domid, + libxl_dmid dmid) { - return libxl__sprintf(gc, "/var/lib/xen/qemu-save.%d", domid); + return libxl__sprintf(gc, "/var/lib/xen/qemu-save.%u.%u", domid, dmid); } const char *libxl__domain_device_model(libxl__gc *gc, - const libxl_domain_build_info *info) + uint32_t dmid, + const libxl_domain_build_info *b_info) { libxl_ctx *ctx = libxl__gc_owner(gc); const char *dm; + libxl_domain_config *guest_config = CONTAINER_OF(b_info, *guest_config, + b_info); - if (libxl_defbool_val(info->device_model_stubdomain)) + if (libxl_defbool_val(guest_config->b_info.device_model_stubdomain)) return NULL; - if (info->device_model) { - dm = libxl__strdup(gc, info->device_model); + if (dmid < guest_config->num_dms && guest_config->dms[dmid].path) { + dm = libxl__strdup(gc, guest_config->dms[dmid].path); + } else if (guest_config->b_info.device_model) { + dm = libxl__strdup(gc, guest_config->b_info.device_model); } else { - switch (info->device_model_version) { + switch (guest_config->b_info.device_model_version) { case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: dm = libxl__abs_path(gc, "qemu-dm", libxl__libexec_path()); break; @@ -55,7 +61,7 @@ const char *libxl__domain_device_model(libxl__gc *gc, default: LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "invalid device model version %d\n", - info->device_model_version); + guest_config->b_info.device_model_version); dm = NULL; break; } @@ -63,7 +69,8 @@ const char *libxl__domain_device_model(libxl__gc *gc, return dm; } -const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *guest_config) +const libxl_vnc_info *libxl__dm_vnc(libxl_dmid dmid, + const libxl_domain_config *guest_config) { const libxl_vnc_info *vnc = NULL; if (guest_config->b_info.type == LIBXL_DOMAIN_TYPE_HVM) { @@ -103,7 +110,7 @@ static char ** libxl__build_device_model_args_old(libxl__gc *gc, const libxl_domain_create_info *c_info = &guest_config->c_info; const libxl_domain_build_info *b_info = &guest_config->b_info; const libxl_device_nic *nics = guest_config->nics; - const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config); + const libxl_vnc_info *vnc = libxl__dm_vnc(0, guest_config); const libxl_sdl_info *sdl = dm_sdl(guest_config); const int num_nics = guest_config->num_nics; const char *keymap = dm_keymap(guest_config); @@ -321,24 +328,58 @@ static char *dm_spice_options(libxl__gc *gc, return opt; } +static int libxl__dm_has_vif(const char *vifname, libxl_dmid dmid, + const libxl_domain_config *guest_config) +{ + const libxl_dm *dm_config = &guest_config->dms[dmid]; + int i = 0; + + if (!vifname && (dm_config->capabilities & LIBXL_DM_CAP_UI)) + return 1; + + if (!dm_config->vifs) + return 0; + + for (i = 0; dm_config->vifs[i]; i++) { + if (!strcmp(dm_config->vifs[i], vifname)) + return 1; + } + + return 0; +} + static char ** libxl__build_device_model_args_new(libxl__gc *gc, - const char *dm, int guest_domid, + const char *dm, libxl_dmid guest_domid, + libxl_dmid dmid, const libxl_domain_config *guest_config, const libxl__domain_build_state *state) { + /** + * PCI device number. Before 3, we have IDE, ISA, SouthBridge and + * XEN PCI. Theses devices will be emulate in each QEMU, but only + * one QEMU (the one which emulates default device) will register + * these devices through Xen PCI hypercall. + */ + static unsigned int bdf = 3; + libxl_ctx *ctx = libxl__gc_owner(gc); const libxl_domain_create_info *c_info = &guest_config->c_info; const libxl_domain_build_info *b_info = &guest_config->b_info; + const libxl_dm *dm_config = &guest_config->dms[dmid]; const libxl_device_disk *disks = guest_config->disks; const libxl_device_nic *nics = guest_config->nics; const int num_disks = guest_config->num_disks; const int num_nics = guest_config->num_nics; - const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config); + const libxl_vnc_info *vnc = libxl__dm_vnc(dmid, guest_config); const libxl_sdl_info *sdl = dm_sdl(guest_config); const char *keymap = dm_keymap(guest_config); flexarray_t *dm_args; int i; uint64_t ram_size; + uint32_t cap_ui = dm_config->capabilities & LIBXL_DM_CAP_UI; + uint32_t cap_ide = dm_config->capabilities & LIBXL_DM_CAP_IDE; + uint32_t cap_serial = dm_config->capabilities & LIBXL_DM_CAP_SERIAL; + uint32_t cap_audio = dm_config->capabilities & LIBXL_DM_CAP_AUDIO; dm_args = flexarray_make(16, 1); if (!dm_args) @@ -348,11 +389,12 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, "-xen-domid", libxl__sprintf(gc, "%d", guest_domid), NULL); + flexarray_append(dm_args, "-nodefaults"); flexarray_append(dm_args, "-chardev"); flexarray_append(dm_args, libxl__sprintf(gc, "socket,id=libxl-cmd," - "path=%s/qmp-libxl-%d,server,nowait", - libxl__run_dir_path(), guest_domid)); + "path=%s/qmp-libxl-%u-%u,server,nowait", + libxl__run_dir_path(), guest_domid, dmid)); flexarray_append(dm_args, "-mon"); flexarray_append(dm_args, "chardev=libxl-cmd,mode=control"); @@ -364,7 +406,8 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, if (c_info->name) { flexarray_vappend(dm_args, "-name", c_info->name, NULL); } - if (vnc) { + + if (vnc && cap_ui) { int display = 0; const char *listen = "127.0.0.1"; char *vncarg = NULL; @@ -395,7 +438,7 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, } flexarray_append(dm_args, vncarg); } - if (sdl) { + if (sdl && cap_ui) { flexarray_append(dm_args, "-sdl"); /* XXX sdl->{display,xauthority} into $DISPLAY/$XAUTHORITY */ } @@ -411,13 +454,27 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) { int ioemu_nics = 0; - if (b_info->u.hvm.serial) { + if (b_info->u.hvm.serial && cap_serial) { flexarray_vappend(dm_args, "-serial", b_info->u.hvm.serial, NULL); } - if (libxl_defbool_val(b_info->u.hvm.nographic) && (!sdl && !vnc)) { + if ((libxl_defbool_val(b_info->u.hvm.nographic) && (!sdl && !vnc)) + || !cap_ui) { flexarray_append(dm_args, "-nographic"); } + else { + switch (b_info->u.hvm.vga.kind) { + case LIBXL_VGA_INTERFACE_TYPE_STD: + flexarray_vappend(dm_args, "-device", + GCSPRINTF("VGA,addr=%u", bdf++), NULL); + break; + case LIBXL_VGA_INTERFACE_TYPE_CIRRUS: + flexarray_vappend(dm_args, "-device", + GCSPRINTF("cirrus-vga,addr=%u", bdf++), + NULL); + break; + } + } if (libxl_defbool_val(b_info->u.hvm.spice.enable)) { const libxl_spice_info *spice = &b_info->u.hvm.spice; @@ -429,27 +486,19 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, flexarray_append(dm_args, spiceoptions); } - switch (b_info->u.hvm.vga.kind) { - case LIBXL_VGA_INTERFACE_TYPE_STD: - flexarray_vappend(dm_args, "-vga", "std", NULL); - break; - case LIBXL_VGA_INTERFACE_TYPE_CIRRUS: - flexarray_vappend(dm_args, "-vga", "cirrus", NULL); - break; - } - if (b_info->u.hvm.boot) { flexarray_vappend(dm_args, "-boot", libxl__sprintf(gc, "order=%s", b_info->u.hvm.boot), NULL); } - if (libxl_defbool_val(b_info->u.hvm.usb) || b_info->u.hvm.usbdevice) { + if ((libxl_defbool_val(b_info->u.hvm.usb) || b_info->u.hvm.usbdevice) + && cap_ui) { flexarray_append(dm_args, "-usb"); if (b_info->u.hvm.usbdevice) { flexarray_vappend(dm_args, "-usbdevice", b_info->u.hvm.usbdevice, NULL); } } - if (b_info->u.hvm.soundhw) { + if (b_info->u.hvm.soundhw && cap_audio) { flexarray_vappend(dm_args, "-soundhw", b_info->u.hvm.soundhw, NULL); } if (!libxl_defbool_val(b_info->u.hvm.acpi)) { @@ -469,7 +518,8 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, b_info->max_vcpus)); } for (i = 0; i < num_nics; i++) { - if (nics[i].nictype == LIBXL_NIC_TYPE_VIF_IOEMU) { + if (nics[i].nictype == LIBXL_NIC_TYPE_VIF_IOEMU + && libxl__dm_has_vif(nics[i].id, dmid, guest_config)) { char *smac = libxl__sprintf(gc, LIBXL_MAC_FMT, LIBXL_MAC_BYTES(nics[i].mac)); const char *ifname = libxl__device_nic_devname(gc, @@ -477,9 +527,9 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, LIBXL_NIC_TYPE_VIF_IOEMU); flexarray_append(dm_args, "-device"); flexarray_append(dm_args, - libxl__sprintf(gc, "%s,id=nic%d,netdev=net%d,mac=%s", - nics[i].model, nics[i].devid, - nics[i].devid, smac)); + GCSPRINTF("%s,id=nic%d,netdev=net%d,mac=%s,addr=%u", + nics[i].model, nics[i].devid, + nics[i].devid, smac, bdf++)); flexarray_append(dm_args, "-netdev"); flexarray_append(dm_args, GCSPRINTF( "type=tap,id=net%d,ifname=%s," @@ -495,7 +545,7 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, flexarray_append(dm_args, "-net"); flexarray_append(dm_args, "none"); } - if (libxl_defbool_val(b_info->u.hvm.gfx_passthru)) { + if (libxl_defbool_val(b_info->u.hvm.gfx_passthru) && cap_ui) { flexarray_append(dm_args, "-gfx_passthru"); } } else { @@ -506,13 +556,14 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, if (state->saved_state) { /* This file descriptor is meant to be used by QEMU */ - int migration_fd = open(state->saved_state, O_RDONLY); + int migration_fd = open(libxl__sprintf(gc, "%s.%u", state->saved_state, + dmid), O_RDONLY); flexarray_append(dm_args, "-incoming"); flexarray_append(dm_args, libxl__sprintf(gc, "fd:%d", migration_fd)); } for (i = 0; b_info->extra && b_info->extra[i] != NULL; i++) flexarray_append(dm_args, b_info->extra[i]); - flexarray_append(dm_args, "-M"); + flexarray_append(dm_args, "-machine"); switch (b_info->type) { case LIBXL_DOMAIN_TYPE_PV: flexarray_append(dm_args, "xenpv"); @@ -520,7 +571,11 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, flexarray_append(dm_args, b_info->extra_pv[i]); break; case LIBXL_DOMAIN_TYPE_HVM: - flexarray_append(dm_args, "xenfv"); + flexarray_append(dm_args, + libxl__sprintf(gc, + "xenfv,xen_dmid=%u,xen_default_dev=%s,xen_emulate_ide=%s", + dmid, (cap_ui) ? "on" : "off", + (cap_ide) ? "on" : "off")); for (i = 0; b_info->extra_hvm && b_info->extra_hvm[i] != NULL; i++) flexarray_append(dm_args, b_info->extra_hvm[i]); break; @@ -528,65 +583,69 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, abort(); } - ram_size = libxl__sizekb_to_mb(b_info->max_memkb - b_info->video_memkb); + // Allocate ram space of 32Mo per previous device model to store rom + ram_size = libxl__sizekb_to_mb(b_info->max_memkb - b_info->video_memkb) + + 32 * dmid; flexarray_append(dm_args, "-m"); flexarray_append(dm_args, libxl__sprintf(gc, "%"PRId64, ram_size)); if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) { - for (i = 0; i < num_disks; i++) { - int disk, part; - int dev_number - libxl__device_disk_dev_number(disks[i].vdev, &disk, &part); - const char *format = qemu_disk_format_string(disks[i].format); - char *drive; - - if (dev_number == -1) { - LIBXL__LOG(ctx, LIBXL__LOG_WARNING, "unable to determine" - " disk number for %s", disks[i].vdev); - continue; - } - - if (disks[i].is_cdrom) { - if (disks[i].format == LIBXL_DISK_FORMAT_EMPTY) - drive = libxl__sprintf - (gc, "if=ide,index=%d,media=cdrom", disk); - else - drive = libxl__sprintf - (gc, "file=%s,if=ide,index=%d,media=cdrom,format=%s", - disks[i].pdev_path, disk, format); - } else { - if (disks[i].format == LIBXL_DISK_FORMAT_EMPTY) { - LIBXL__LOG(ctx, LIBXL__LOG_WARNING, "cannot support" - " empty disk format for %s", disks[i].vdev); + if (cap_ide) { + for (i = 0; i < num_disks; i++) { + int disk, part; + int dev_number + libxl__device_disk_dev_number(disks[i].vdev, &disk, &part); + const char *format = qemu_disk_format_string(disks[i].format); + char *drive; + + if (dev_number == -1) { + LIBXL__LOG(ctx, LIBXL__LOG_WARNING, "unable to determine" + " disk number for %s", disks[i].vdev); continue; } - if (format == NULL) { - LIBXL__LOG(ctx, LIBXL__LOG_WARNING, "unable to determine" - " disk image format %s", disks[i].vdev); - continue; + if (disks[i].is_cdrom) { + if (disks[i].format == LIBXL_DISK_FORMAT_EMPTY) + drive = libxl__sprintf + (gc, "if=ide,index=%d,media=cdrom", disk); + else + drive = libxl__sprintf + (gc, "file=%s,if=ide,index=%d,media=cdrom,format=%s", + disks[i].pdev_path, disk, format); + } else { + if (disks[i].format == LIBXL_DISK_FORMAT_EMPTY) { + LIBXL__LOG(ctx, LIBXL__LOG_WARNING, "cannot support" + " empty disk format for %s", disks[i].vdev); + continue; + } + + if (format == NULL) { + LIBXL__LOG(ctx, LIBXL__LOG_WARNING, "unable to determine" + " disk image format %s", disks[i].vdev); + continue; + } + + /* + * Explicit sd disks are passed through as is. + * + * For other disks we translate devices 0..3 into + * hd[a-d] and ignore the rest. + */ + if (strncmp(disks[i].vdev, "sd", 2) == 0) + drive = libxl__sprintf + (gc, "file=%s,if=scsi,bus=0,unit=%d,format=%s", + disks[i].pdev_path, disk, format); + else if (disk < 4) + drive = libxl__sprintf + (gc, "file=%s,if=ide,index=%d,media=disk,format=%s", + disks[i].pdev_path, disk, format); + else + continue; /* Do not emulate this disk */ } - /* - * Explicit sd disks are passed through as is. - * - * For other disks we translate devices 0..3 into - * hd[a-d] and ignore the rest. - */ - if (strncmp(disks[i].vdev, "sd", 2) == 0) - drive = libxl__sprintf - (gc, "file=%s,if=scsi,bus=0,unit=%d,format=%s", - disks[i].pdev_path, disk, format); - else if (disk < 4) - drive = libxl__sprintf - (gc, "file=%s,if=ide,index=%d,media=disk,format=%s", - disks[i].pdev_path, disk, format); - else - continue; /* Do not emulate this disk */ + flexarray_append(dm_args, "-drive"); + flexarray_append(dm_args, drive); } - - flexarray_append(dm_args, "-drive"); - flexarray_append(dm_args, drive); } } flexarray_append(dm_args, NULL); @@ -594,7 +653,9 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, } static char ** libxl__build_device_model_args(libxl__gc *gc, - const char *dm, int guest_domid, + const char *dm, + libxl_domid guest_domid, + libxl_dmid dmid, const libxl_domain_config *guest_config, const libxl__domain_build_state *state) { @@ -607,8 +668,8 @@ static char ** libxl__build_device_model_args(libxl__gc *gc, state); case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN: return libxl__build_device_model_args_new(gc, dm, - guest_domid, guest_config, - state); + guest_domid, dmid, + guest_config, state); default: LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unknown device model version %d", guest_config->b_info.device_model_version); @@ -729,7 +790,8 @@ char *libxl__stub_dm_name(libxl__gc *gc, const char *guest_name) return libxl__sprintf(gc, "%s-dm", guest_name); } -void libxl__spawn_stub_dm(libxl__egc *egc, libxl__stub_dm_spawn_state *sdss) +static void libxl__spawn_stub_dm(libxl__egc *egc, + libxl__stub_dm_spawn_state *sdss) { STATE_AO_GC(sdss->dm.spawn.ao); libxl_ctx *ctx = libxl__gc_owner(gc); @@ -815,7 +877,7 @@ void libxl__spawn_stub_dm(libxl__egc *egc, libxl__stub_dm_spawn_state *sdss) if (ret) goto out; - args = libxl__build_device_model_args(gc, "stubdom-dm", guest_domid, + args = libxl__build_device_model_args(gc, "stubdom-dm", guest_domid, 0, guest_config, d_state); if (!args) { ret = ERROR_FAIL; @@ -871,12 +933,16 @@ out: spawn_stubdom_pvqemu_cb(egc, &sdss->pvqemu, ret); } +static void libxl__spawn_local_dm(libxl__egc *egc, + libxl__dm_spawn_state *sdss); + static void spawn_stub_launch_dm(libxl__egc *egc, libxl__multidev *multidev, int ret) { libxl__stub_dm_spawn_state *sdss = CONTAINER_OF(multidev, *sdss, multidev); STATE_AO_GC(sdss->dm.spawn.ao); libxl_ctx *ctx = libxl__gc_owner(gc); + libxl_dmid dmid = sdss->dm.dmid; int i, num_console = STUBDOM_SPECIAL_CONSOLES; libxl__device_console *console; @@ -937,7 +1003,8 @@ static void spawn_stub_launch_dm(libxl__egc *egc, break; case STUBDOM_CONSOLE_SAVE: console[i].output = libxl__sprintf(gc, "file:%s", - libxl__device_model_savefile(gc, guest_domid)); + libxl__device_model_savefile(gc, guest_domid, + dmid)); break; case STUBDOM_CONSOLE_RESTORE: if (d_state->saved_state) @@ -1049,10 +1116,11 @@ static void device_model_spawn_outcome(libxl__egc *egc, libxl__dm_spawn_state *dmss, int rc); -void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss) +static void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss) { /* convenience aliases */ const int domid = dmss->guest_domid; + const libxl_dmid dmid = dmss->dmid; libxl__domain_build_state *const state = dmss->build_state; libxl__spawn_state *const spawn = &dmss->spawn; @@ -1062,7 +1130,8 @@ void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss) libxl_domain_config *guest_config = dmss->guest_config; const libxl_domain_create_info *c_info = &guest_config->c_info; const libxl_domain_build_info *b_info = &guest_config->b_info; - const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config); + const libxl_vnc_info *vnc = libxl__dm_vnc(dmid, guest_config); + const libxl_dm *dm_config = &guest_config->dms[dmid]; char *path, *logfile; int logfile_w, null; int rc; @@ -1071,12 +1140,13 @@ void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss) char *vm_path; char **pass_stuff; const char *dm; + const char *name; if (libxl_defbool_val(b_info->device_model_stubdomain)) { abort(); } - dm = libxl__domain_device_model(gc, b_info); + dm = libxl__domain_device_model(gc, dmid, b_info); if (!dm) { rc = ERROR_FAIL; goto out; @@ -1087,7 +1157,7 @@ void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss) rc = ERROR_FAIL; goto out; } - args = libxl__build_device_model_args(gc, dm, domid, guest_config, state); + args = libxl__build_device_model_args(gc, dm, domid, dmid, guest_config, state); if (!args) { rc = ERROR_FAIL; goto out; @@ -1101,7 +1171,7 @@ void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss) free(path); } - path = libxl__sprintf(gc, "/local/domain/0/device-model/%d", domid); + path = libxl__sprintf(gc, "/local/domain/0/dms/%u/%u", domid, dmid); xs_mkdir(ctx->xsh, XBT_NULL, path); if (b_info->type == LIBXL_DOMAIN_TYPE_HVM && @@ -1110,8 +1180,13 @@ void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss) libxl__xs_write(gc, XBT_NULL, libxl__sprintf(gc, "%s/disable_pf", path), "%d", !libxl_defbool_val(b_info->u.hvm.xen_platform_pci)); + + name = dm_config->name; + if (!name) + name = libxl__sprintf(gc, "%u", dmid); + libxl_create_logfile(ctx, - libxl__sprintf(gc, "qemu-dm-%s", c_info->name), + libxl__sprintf(gc, "qemu-%s-%s", name, c_info->name), &logfile); logfile_w = open(logfile, O_WRONLY|O_CREAT|O_APPEND, 0644); free(logfile); @@ -1143,10 +1218,10 @@ retry_transaction: for (arg = args; *arg; arg++) LIBXL__LOG(CTX, XTL_DEBUG, " %s", *arg); - spawn->what = GCSPRINTF("domain %d device model", domid); - spawn->xspath = GCSPRINTF("/local/domain/0/device-model/%d/state", domid); + spawn->what = GCSPRINTF("domain %d device model %s", domid, name); + spawn->xspath = GCSPRINTF("/local/domain/0/dms/%u/%u/state", domid, dmid); spawn->timeout_ms = LIBXL_DEVICE_MODEL_START_TIMEOUT * 1000; - spawn->pidpath = GCSPRINTF("%s/image/device-model-pid", dom_path); + spawn->pidpath = GCSPRINTF("%s/image/dms/%u-pid", dom_path, dmid); spawn->midproc_cb = libxl__spawn_record_pid; spawn->confirm_cb = device_model_confirm; spawn->failure_cb = device_model_startup_failed; @@ -1171,6 +1246,32 @@ out: device_model_spawn_outcome(egc, dmss, rc); } +void libxl__spawn_dm(libxl__egc *egc, libxl__stub_dm_spawn_state *dmss) +{ + libxl__domain_create_state *dcs = dmss->dm.dcs; + libxl_domain_config *const d_config = dcs->guest_config; + STATE_AO_GC(dmss->dm.spawn.ao); + + switch (d_config->c_info.type) { + case LIBXL_DOMAIN_TYPE_HVM: + { + dmss->dm.guest_domid = dcs->guest_domid; + if (libxl_defbool_val(d_config->b_info.device_model_stubdomain)) + libxl__spawn_stub_dm(egc, dmss); + else + libxl__spawn_local_dm(egc, &dmss->dm); + break; + } + case LIBXL_DOMAIN_TYPE_PV: + { + dmss->dm.guest_domid = dcs->guest_domid; + libxl__spawn_local_dm(egc, &dmss->dm); + break; + } + default: + LIBXL__LOG(CTX, XTL_ERROR, "Unknow type %u", d_config->c_info.type); + } +} static void device_model_confirm(libxl__egc *egc, libxl__spawn_state *spawn, const char *xsdata) @@ -1207,6 +1308,7 @@ static void device_model_spawn_outcome(libxl__egc *egc, { STATE_AO_GC(dmss->spawn.ao); int ret2; + char *filename; if (rc) LOG(ERROR, "%s: spawn failed (rc=%d)", dmss->spawn.what, rc); @@ -1214,10 +1316,11 @@ static void device_model_spawn_outcome(libxl__egc *egc, libxl__domain_build_state *state = dmss->build_state; if (state->saved_state) { - ret2 = unlink(state->saved_state); + filename = GCSPRINTF("%s.%u", state->saved_state, dmss->dmid); + ret2 = unlink(filename); if (ret2) { LOGE(ERROR, "%s: failed to remove device-model state %s", - dmss->spawn.what, state->saved_state); + dmss->spawn.what, filename); rc = ERROR_FAIL; goto out; } @@ -1229,12 +1332,14 @@ static void device_model_spawn_outcome(libxl__egc *egc, dmss->callback(egc, dmss, rc); } -int libxl__destroy_device_model(libxl__gc *gc, uint32_t domid) +static int libxl__destroy_device_model(libxl__gc *gc, libxl_domid domid, + libxl_dmid dmid) { char *pid; int ret; - pid = libxl__xs_read(gc, XBT_NULL, libxl__sprintf(gc, "/local/domain/%d/image/device-model-pid", domid)); + pid = libxl__xs_read(gc, XBT_NULL, GCSPRINTF("/local/domain/%u/image/dms/%u-pid", + domid, dmid)); if (!pid || !atoi(pid)) { LOG(ERROR, "could not find device-model''s pid for dom %u", domid); ret = ERROR_FAIL; @@ -1300,6 +1405,60 @@ out: return ret; } +libxl_dmid *libxl__list_device_models(libxl__gc *gc, libxl_domid domid, + unsigned *num_dms) +{ + unsigned int i = 0; + char **dir = NULL; + libxl_dmid *dms = NULL; + unsigned int num = 0; + + dir = libxl__xs_directory(gc, XBT_NULL, + GCSPRINTF("/local/domain/0/dms/%u", domid), + &num); + if (dir) { + GCNEW_ARRAY(dms, num); + + if (num_dms) + *num_dms = num; + + for (i = 0; i < num; i++) { + dms[i] = atoi(dir[i]); + } + + return dms; + } + else + return NULL; +} + +int libxl__destroy_device_models(libxl__gc *gc, + libxl_domid domid) +{ + libxl_ctx *ctx = libxl__gc_owner(gc); + int ret = 0; + libxl_dmid *dms = NULL; + unsigned int num_dms; + unsigned int i; + + dms = libxl__list_device_models(gc, domid, &num_dms); + + if (!dms) + return ERROR_FAIL; + + for (i = 0; i < num_dms; i++) + ret |= libxl__destroy_device_model(gc, domid, dms[i]); + + if (!ret) { + xs_rm(ctx->xsh, XBT_NULL, libxl__sprintf(gc, "/local/domain/0/dms/%u", + domid)); + xs_rm(ctx->xsh, XBT_NULL, libxl__sprintf(gc, "/local/domain/0/device-model/%u", + domid)); + } + + return ret; + } + /* * Local variables: * mode: C diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 06d5e4f..475fea8 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -544,6 +544,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, libxl_ctx *ctx = libxl__gc_owner(gc); int ret, rc = ERROR_FAIL; const char *firmware = libxl__domain_firmware(gc, info); + libxl_domain_config *d_config = CONTAINER_OF(info, *d_config, b_info); if (!firmware) goto out; @@ -552,7 +553,8 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, domid, (info->max_memkb - info->video_memkb) / 1024, (info->target_memkb - info->video_memkb) / 1024, - firmware); + firmware, + state->num_dms * 2 + 1); if (ret) { LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, ret, "hvm building failed"); goto out; diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 71e4970..2e6eedc 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -867,6 +867,7 @@ typedef struct { unsigned long console_mfn; unsigned long vm_generationid_addr; + unsigned long num_dms; char *saved_state; @@ -887,7 +888,7 @@ _hidden int libxl__build_hvm(libxl__gc *gc, uint32_t domid, libxl_domain_build_info *info, libxl__domain_build_state *state); -_hidden int libxl__qemu_traditional_cmd(libxl__gc *gc, uint32_t domid, +_hidden int libxl__qemu_traditional_cmd(libxl__gc *gc, libxl_domid domid, const char *cmd); _hidden int libxl__domain_rename(libxl__gc *gc, uint32_t domid, const char *old_name, const char *new_name, @@ -947,6 +948,8 @@ _hidden int libxl__domain_create_info_setdefault(libxl__gc *gc, libxl_domain_create_info *c_info); _hidden int libxl__domain_build_info_setdefault(libxl__gc *gc, libxl_domain_build_info *b_info); +_hidden int libxl__dm_setdefault(libxl__gc *gc, + libxl_dm *dm); _hidden int libxl__device_disk_setdefault(libxl__gc *gc, libxl_device_disk *disk); _hidden int libxl__device_nic_setdefault(libxl__gc *gc, libxl_device_nic *nic, @@ -1042,7 +1045,9 @@ _hidden char *libxl__devid_to_localdev(libxl__gc *gc, int devid); /* from libxl_pci */ -_hidden int libxl__device_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, int starting); +_hidden int libxl__device_pci_add(libxl__gc *gc, libxl_domid domid, + libxl_device_pci *pcidev, + int starting); _hidden int libxl__create_pci_backend(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, int num); _hidden int libxl__device_pci_destroy_all(libxl__gc *gc, uint32_t domid); @@ -1272,6 +1277,7 @@ _hidden int libxl__domain_build(libxl__gc *gc, /* for device model creation */ _hidden const char *libxl__domain_device_model(libxl__gc *gc, + libxl_dmid dmid, const libxl_domain_build_info *info); _hidden int libxl__need_xenpv_qemu(libxl__gc *gc, int nr_consoles, libxl__device_console *consoles, @@ -1281,7 +1287,9 @@ _hidden int libxl__need_xenpv_qemu(libxl__gc *gc, * return pass *starting_r (which will be non-0) to * libxl__confirm_device_model_startup or libxl__detach_device_model. */ _hidden int libxl__wait_for_device_model(libxl__gc *gc, - uint32_t domid, char *state, + libxl_domid domid, + libxl_dmid dmid, + char *state, libxl__spawn_starting *spawning, int (*check_callback)(libxl__gc *gc, uint32_t domid, @@ -1289,9 +1297,14 @@ _hidden int libxl__wait_for_device_model(libxl__gc *gc, void *userdata), void *check_callback_userdata); -_hidden int libxl__destroy_device_model(libxl__gc *gc, uint32_t domid); +_hidden libxl_dmid *libxl__list_device_models(libxl__gc *gc, + libxl_domid domid, + unsigned int *num_dms); -_hidden const libxl_vnc_info *libxl__dm_vnc(const libxl_domain_config *g_cfg); +_hidden int libxl__destroy_device_models(libxl__gc *gc, libxl_domid domid); + +_hidden const libxl_vnc_info *libxl__dm_vnc(libxl_dmid dmid, + const libxl_domain_config *g_cfg); _hidden char *libxl__abs_path(libxl__gc *gc, const char *s, const char *path); @@ -2427,10 +2440,10 @@ struct libxl__dm_spawn_state { libxl_domain_config *guest_config; libxl__domain_build_state *build_state; /* relates to guest_domid */ libxl__dm_spawn_cb *callback; + libxl_dmid dmid; + struct libxl__domain_create_state *dcs; }; -_hidden void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state*); - /* Stubdom device models. */ typedef struct { @@ -2447,7 +2460,7 @@ typedef struct { libxl__multidev multidev; } libxl__stub_dm_spawn_state; -_hidden void libxl__spawn_stub_dm(libxl__egc *egc, libxl__stub_dm_spawn_state*); +_hidden void libxl__spawn_dm(libxl__egc *egc, libxl__stub_dm_spawn_state*); _hidden char *libxl__stub_dm_name(libxl__gc *gc, const char * guest_name); @@ -2470,7 +2483,8 @@ struct libxl__domain_create_state { int guest_domid; libxl__domain_build_state build_state; libxl__bootloader_state bl; - libxl__stub_dm_spawn_state dmss; + libxl_dmid current_dmid; + libxl__stub_dm_spawn_state* dmss; /* If we''re not doing stubdom, we use only dmss.dm, * for the non-stubdom device model. */ libxl__save_helper_state shs; @@ -2527,7 +2541,9 @@ _hidden void libxl__domain_save_device_model(libxl__egc *egc, libxl__domain_suspend_state *dss, libxl__save_device_model_cb *callback); -_hidden const char *libxl__device_model_savefile(libxl__gc *gc, uint32_t domid); +_hidden const char *libxl__device_model_savefile(libxl__gc *gc, + libxl_domid domid, + libxl_dmid dmid); /* -- Julien Grall
Quickly fix for PCI library. For the moment each hotplug PCI are add to QEMU 0. We need to find a best way to specify which qemu handle the PCI. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- tools/libxl/libxl_pci.c | 19 +++++++++++-------- 1 files changed, 11 insertions(+), 8 deletions(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 48986f3..fe02ccd 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -834,13 +834,12 @@ static int qemu_pci_add_xenstore(libxl__gc *gc, uint32_t domid, } libxl__qemu_traditional_cmd(gc, domid, "pci-ins"); - rc = libxl__wait_for_device_model(gc, domid, NULL, NULL, + rc = libxl__wait_for_device_model(gc, domid, 0, NULL, NULL, pci_ins_check, state); path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/parameter", domid); vdevfn = libxl__xs_read(gc, XBT_NULL, path); - path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state", - domid); + path = libxl__sprintf(gc, "/local/domain/0/dms/%d/state", domid); if ( rc < 0 ) LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "qemu refused to add device: %s", vdevfn); @@ -858,11 +857,13 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i { libxl_ctx *ctx = libxl__gc_owner(gc); int rc, hvm = 0; + /* FIXME: handle multiple device model */ + libxl_dmid dmid = 0; switch (libxl__domain_type(gc, domid)) { case LIBXL_DOMAIN_TYPE_HVM: hvm = 1; - if (libxl__wait_for_device_model(gc, domid, "running", + if (libxl__wait_for_device_model(gc, domid, dmid, "running", NULL, NULL, NULL) < 0) { return ERROR_FAIL; } @@ -871,7 +872,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, libxl_device_pci *pcidev, i rc = qemu_pci_add_xenstore(gc, domid, pcidev); break; case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN: - rc = libxl__qmp_pci_add(gc, domid, pcidev); + rc = libxl__qmp_pci_add(gc, domid, dmid, pcidev); break; default: return ERROR_INVAL; @@ -1136,7 +1137,7 @@ static int qemu_pci_remove_xenstore(libxl__gc *gc, uint32_t domid, * device-model for function 0 */ if ( !force && (pcidev->vdevfn & 0x7) == 0 ) { libxl__qemu_traditional_cmd(gc, domid, "pci-rem"); - if (libxl__wait_for_device_model(gc, domid, "pci-removed", + if (libxl__wait_for_device_model(gc, domid, 0, "pci-removed", NULL, NULL, NULL) < 0) { LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "Device Model didn''t respond in time"); /* This depends on guest operating system acknowledging the @@ -1162,6 +1163,8 @@ static int do_pci_remove(libxl__gc *gc, uint32_t domid, libxl_device_pci *assigned; int hvm = 0, rc, num; int stubdomid = 0; + /* FIXME: Handle multiple device model */ + libxl_dmid dmid = 0; assigned = libxl_device_pci_list(ctx, domid, &num); if ( assigned == NULL ) @@ -1178,7 +1181,7 @@ static int do_pci_remove(libxl__gc *gc, uint32_t domid, switch (libxl__domain_type(gc, domid)) { case LIBXL_DOMAIN_TYPE_HVM: hvm = 1; - if (libxl__wait_for_device_model(gc, domid, "running", + if (libxl__wait_for_device_model(gc, domid, dmid, "running", NULL, NULL, NULL) < 0) goto out_fail; @@ -1187,7 +1190,7 @@ static int do_pci_remove(libxl__gc *gc, uint32_t domid, rc = qemu_pci_remove_xenstore(gc, domid, pcidev, force); break; case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN: - rc = libxl__qmp_pci_del(gc, domid, pcidev); + rc = libxl__qmp_pci_del(gc, domid, dmid, pcidev); break; default: rc = ERROR_INVAL; -- Julien Grall
Julien Grall
2012-Aug-22 12:32 UTC
[XEN][RFC PATCH V2 17/17] xl: implement save/restore for multiple device models
Each device model will be save/restore one by one. Signed-off-by: Julien Grall <julien.grall@citrix.com> --- tools/libxl/libxl.c | 5 +- tools/libxl/libxl_dom.c | 143 ++++++++++++++++++++++++++++++++++-------- tools/libxl/libxl_internal.h | 16 +++-- 3 files changed, 130 insertions(+), 34 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 60718b6..e9d14e8 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -416,7 +416,7 @@ int libxl_domain_resume(libxl_ctx *ctx, uint32_t domid, int suspend_cancel) } if (type == LIBXL_DOMAIN_TYPE_HVM) { - rc = libxl__domain_resume_device_model(gc, domid); + rc = libxl__domain_resume_device_models(gc, domid); if (rc) { LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "failed to resume device model for domain %u:%d", @@ -852,8 +852,9 @@ int libxl_domain_unpause(libxl_ctx *ctx, uint32_t domid) path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state", domid); state = libxl__xs_read(gc, XBT_NULL, path); if (state != NULL && !strcmp(state, "paused")) { + /* FIXME: handle multiple qemu */ libxl__qemu_traditional_cmd(gc, domid, "continue"); - libxl__wait_for_device_model(gc, domid, "running", + libxl__wait_for_device_model(gc, domid, 0, "running", NULL, NULL, NULL); } } diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index 475fea8..cd07140 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -582,6 +582,7 @@ int libxl__qemu_traditional_cmd(libxl__gc *gc, uint32_t domid, } struct libxl__physmap_info { + libxl_dmid device_model; uint64_t phys_offset; uint64_t start_addr; uint64_t size; @@ -640,6 +641,10 @@ int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf, pi = (struct libxl__physmap_info*) ptr; ptr += sizeof(struct libxl__physmap_info) + pi->namelen; + xs_path = restore_helper(gc, domid, pi->phys_offset, "device_model"); + ret = libxl__xs_write(gc, 0, xs_path, "%u", pi->device_model); + if (ret) + return -1; xs_path = restore_helper(gc, domid, pi->phys_offset, "start_addr"); ret = libxl__xs_write(gc, 0, xs_path, "%"PRIx64, pi->start_addr); if (ret) @@ -839,27 +844,28 @@ static void switch_logdirty_done(libxl__egc *egc, /*----- callbacks, called by xc_domain_save -----*/ -int libxl__domain_suspend_device_model(libxl__gc *gc, - libxl__domain_suspend_state *dss) +static int libxl__domain_suspend_device_model(libxl__gc *gc, + libxl_dmid dmid, + libxl__domain_suspend_state *dss) { libxl_ctx *ctx = libxl__gc_owner(gc); int ret = 0; uint32_t const domid = dss->domid; - const char *const filename = dss->dm_savefile; + const char *const filename = libxl__device_model_savefile(gc, domid, dmid); switch (libxl__device_model_version_running(gc, domid)) { case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: { LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "Saving device model state to %s", filename); libxl__qemu_traditional_cmd(gc, domid, "save"); - libxl__wait_for_device_model(gc, domid, "paused", NULL, NULL, NULL); + libxl__wait_for_device_model(gc, domid, 0, "paused", NULL, NULL, NULL); break; } case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN: - if (libxl__qmp_stop(gc, domid)) + if (libxl__qmp_stop(gc, domid, dmid)) return ERROR_FAIL; /* Save DM state into filename */ - ret = libxl__qmp_save(gc, domid, filename); + ret = libxl__qmp_save(gc, domid, dmid, filename); if (ret) unlink(filename); break; @@ -870,21 +876,67 @@ int libxl__domain_suspend_device_model(libxl__gc *gc, return ret; } -int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid) +int libxl__domain_suspend_device_models(libxl__gc *gc, + libxl__domain_suspend_state *dss) { + libxl_dmid *dms = NULL; + unsigned int num_dms = 0; + unsigned int i; + int ret; + + dms = libxl__list_device_models(gc, dss->domid, &num_dms); + + if (!dms) + return ERROR_FAIL; + + for (i = 0; i < num_dms; i++) + { + ret = libxl__domain_suspend_device_model(gc, dms[i], dss); + if (ret) + return ret; + } + + return 0; +} +static int libxl__domain_resume_device_model(libxl__gc *gc, libxl_domid domid, + libxl_dmid dmid) +{ switch (libxl__device_model_version_running(gc, domid)) { case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL: { libxl__qemu_traditional_cmd(gc, domid, "continue"); - libxl__wait_for_device_model(gc, domid, "running", NULL, NULL, NULL); + libxl__wait_for_device_model(gc, domid, dmid, "running", NULL, NULL, + NULL); break; } case LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN: - if (libxl__qmp_resume(gc, domid)) + if (libxl__qmp_resume(gc, domid, dmid)) return ERROR_FAIL; default: return ERROR_INVAL; } + return 0; +} + +int libxl__domain_resume_device_models(libxl__gc *gc, + libxl_domid domid) +{ + libxl_dmid *dms = NULL; + unsigned int num_dms = 0; + unsigned int i = 0; + int ret = 0; + + dms = libxl__list_device_models(gc, domid, &num_dms); + + if (!dms) + return ERROR_FAIL; + + for (i = 0; i < num_dms; i++) + { + ret = libxl__domain_resume_device_model(gc, domid, dms[i]); + if (ret) + return ret; + } return 0; } @@ -1014,9 +1066,10 @@ int libxl__domain_suspend_common_callback(void *user) guest_suspended: if (dss->hvm) { - ret = libxl__domain_suspend_device_model(gc, dss); + ret = libxl__domain_suspend_device_models(gc, dss); if (ret) { - LOG(ERROR, "libxl__domain_suspend_device_model failed ret=%d", ret); + LOG(ERROR, "libxl__domain_suspend_device_models failed ret=%d", + ret); return 0; } } @@ -1038,6 +1091,7 @@ int libxl__toolstack_save(uint32_t domid, uint8_t **buf, STATE_AO_GC(dss->ao); int i = 0; char *start_addr = NULL, *size = NULL, *phys_offset = NULL, *name = NULL; + char *device_model = NULL; unsigned int num = 0; uint32_t count = 0, version = TOOLSTACK_SAVE_VERSION, namelen = 0; uint8_t *ptr = NULL; @@ -1068,6 +1122,13 @@ int libxl__toolstack_save(uint32_t domid, uint8_t **buf, return -1; } + xs_path = physmap_path(gc, domid, phys_offset, "device_model"); + device_model = libxl__xs_read(gc, 0, xs_path); + if (device_model == NULL) { + LOG(ERROR, "%s is NULL", xs_path); + return -1; + } + xs_path = physmap_path(gc, domid, phys_offset, "start_addr"); start_addr = libxl__xs_read(gc, 0, xs_path); if (start_addr == NULL) { @@ -1095,6 +1156,7 @@ int libxl__toolstack_save(uint32_t domid, uint8_t **buf, return -1; ptr = (*buf) + offset; pi = (struct libxl__physmap_info *) ptr; + pi->device_model = strtol(device_model, NULL, 10); pi->phys_offset = strtoll(phys_offset, NULL, 16); pi->start_addr = strtoll(start_addr, NULL, 16); pi->size = strtoll(size, NULL, 16); @@ -1144,7 +1206,7 @@ static void libxl__remus_domain_checkpoint_callback(void *data) /* This would go into tailbuf. */ if (dss->hvm) { - libxl__domain_save_device_model(egc, dss, remus_checkpoint_dm_saved); + libxl__domain_save_device_models(egc, dss, remus_checkpoint_dm_saved); } else { remus_checkpoint_dm_saved(egc, dss, 0); } @@ -1207,7 +1269,6 @@ void libxl__domain_suspend(libxl__egc *egc, libxl__domain_suspend_state *dss) dss->suspend_eventchn = -1; dss->guest_responded = 0; - dss->dm_savefile = libxl__device_model_savefile(gc, domid); if (r_info != NULL) { dss->interval = r_info->interval; @@ -1274,10 +1335,10 @@ void libxl__xc_domain_save_done(libxl__egc *egc, void *dss_void, } if (type == LIBXL_DOMAIN_TYPE_HVM) { - rc = libxl__domain_suspend_device_model(gc, dss); + rc = libxl__domain_suspend_device_models(gc, dss); if (rc) goto out; - libxl__domain_save_device_model(egc, dss, domain_suspend_done); + libxl__domain_save_device_models(egc, dss, domain_suspend_done); return; } @@ -1290,29 +1351,30 @@ out: static void save_device_model_datacopier_done(libxl__egc *egc, libxl__datacopier_state *dc, int onwrite, int errnoval); -void libxl__domain_save_device_model(libxl__egc *egc, - libxl__domain_suspend_state *dss, - libxl__save_device_model_cb *callback) +static void libxl__domain_save_device_model(libxl__egc *egc, + libxl__domain_suspend_state *dss) { STATE_AO_GC(dss->ao); struct stat st; uint32_t qemu_state_len; + uint32_t num_dms = dss->num_dms; int rc; - - dss->save_dm_callback = callback; + libxl_dmid dmid = dss->dms[dss->current_dm]; /* Convenience aliases */ - const char *const filename = dss->dm_savefile; + const char *const filename = libxl__device_model_savefile(gc, dss->domid, + dmid); const int fd = dss->fd; libxl__datacopier_state *dc = &dss->save_dm_datacopier; memset(dc, 0, sizeof(*dc)); - dc->readwhat = GCSPRINTF("qemu save file %s", filename); + dc->readwhat = GCSPRINTF("qemu %u save file %s", dmid, filename); dc->ao = ao; dc->readfd = -1; dc->writefd = fd; dc->maxsz = INT_MAX; - dc->copywhat = GCSPRINTF("qemu save file for domain %"PRIu32, dss->domid); + dc->copywhat = GCSPRINTF("qemu %u save file for domain %"PRIu32, + dmid, dss->domid); dc->writewhat = "save/migration stream"; dc->callback = save_device_model_datacopier_done; @@ -1339,8 +1401,16 @@ void libxl__domain_save_device_model(libxl__egc *egc, rc = libxl__datacopier_start(dc); if (rc) goto out; + /* FIXME: Ugly fix to add DMS_SIGNATURE */ + if (dss->current_dm == 0) { + libxl__datacopier_prefixdata(egc, dc, + DMS_SIGNATURE, strlen(DMS_SIGNATURE)); + libxl__datacopier_prefixdata(egc, dc, + &num_dms, sizeof (num_dms)); + } + libxl__datacopier_prefixdata(egc, dc, - QEMU_SIGNATURE, strlen(QEMU_SIGNATURE)); + DM_SIGNATURE, strlen(DM_SIGNATURE)); libxl__datacopier_prefixdata(egc, dc, &qemu_state_len, sizeof(qemu_state_len)); @@ -1350,6 +1420,20 @@ void libxl__domain_save_device_model(libxl__egc *egc, save_device_model_datacopier_done(egc, dc, -1, 0); } +void libxl__domain_save_device_models(libxl__egc *egc, + libxl__domain_suspend_state *dss, + libxl__save_device_model_cb *callback) +{ + STATE_AO_GC(dss->ao); + + dss->save_dm_callback = callback; + dss->num_dms = 0; + dss->current_dm = 0; + dss->dms = libxl__list_device_models(gc, dss->domid, &dss->num_dms); + + libxl__domain_save_device_model(egc, dss); +} + static void save_device_model_datacopier_done(libxl__egc *egc, libxl__datacopier_state *dc, int onwrite, int errnoval) { @@ -1358,7 +1442,9 @@ static void save_device_model_datacopier_done(libxl__egc *egc, STATE_AO_GC(dss->ao); /* Convenience aliases */ - const char *const filename = dss->dm_savefile; + libxl_dmid dmid = dss->dms[dss->current_dm]; + const char *const filename = libxl__device_model_savefile(gc, dss->domid, + dmid); int our_rc = 0; int rc; @@ -1375,7 +1461,12 @@ static void save_device_model_datacopier_done(libxl__egc *egc, rc = libxl__remove_file(gc, filename); if (!our_rc) our_rc = rc; - dss->save_dm_callback(egc, dss, our_rc); + dss->current_dm++; + + if (!our_rc && dss->num_dms != dss->current_dm) + libxl__domain_save_device_model(egc, dss); + else + dss->save_dm_callback(egc, dss, our_rc); } static void domain_suspend_done(libxl__egc *egc, diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 2e6eedc..9de1465 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -93,7 +93,8 @@ #define LIBXL_MIN_DOM0_MEM (128*1024) /* use 0 as the domid of the toolstack domain for now */ #define LIBXL_TOOLSTACK_DOMID 0 -#define QEMU_SIGNATURE "DeviceModelRecord0002" +#define DMS_SIGNATURE "DeviceModelRecords001" +#define DM_SIGNATURE "DeviceModelRecord0002" #define STUBDOM_CONSOLE_LOGGING 0 #define STUBDOM_CONSOLE_SAVE 1 #define STUBDOM_CONSOLE_RESTORE 2 @@ -896,7 +897,8 @@ _hidden int libxl__domain_rename(libxl__gc *gc, uint32_t domid, _hidden int libxl__toolstack_restore(uint32_t domid, const uint8_t *buf, uint32_t size, void *data); -_hidden int libxl__domain_resume_device_model(libxl__gc *gc, uint32_t domid); +_hidden int libxl__domain_resume_device_models(libxl__gc *gc, + libxl_domid domid); _hidden void libxl__userdata_destroyall(libxl__gc *gc, uint32_t domid); @@ -2230,10 +2232,12 @@ struct libxl__domain_suspend_state { int hvm; int xcflags; int guest_responded; - const char *dm_savefile; int interval; /* checkpoint interval (for Remus) */ libxl__save_helper_state shs; libxl__logdirty_switch logdirty; + unsigned int num_dms; + unsigned int current_dm; + libxl_dmid *dms; /* private for libxl__domain_save_device_model */ libxl__save_device_model_cb *save_dm_callback; libxl__datacopier_state save_dm_datacopier; @@ -2535,9 +2539,9 @@ _hidden void libxl__xc_domain_restore_done(libxl__egc *egc, void *dcs_void, int rc, int retval, int errnoval); /* Each time the dm needs to be saved, we must call suspend and then save */ -_hidden int libxl__domain_suspend_device_model(libxl__gc *gc, - libxl__domain_suspend_state *dss); -_hidden void libxl__domain_save_device_model(libxl__egc *egc, +_hidden int libxl__domain_suspend_device_models(libxl__gc *gc, + libxl__domain_suspend_state *dss); +_hidden void libxl__domain_save_device_models(libxl__egc *egc, libxl__domain_suspend_state *dss, libxl__save_device_model_cb *callback); -- Julien Grall
Jan Beulich
2012-Aug-23 07:20 UTC
Re: [XEN][RFC PATCH V2 03/17] hvm-pci: Handle PCI config space in Xen
>>> Julien Grall <julien.grall@citrix.com> 08/22/12 8:56 PM >>> >+int hvm_register_pcidev(domid_t domid, ioservid_t id, >+ uint8_t domain, uint8_t bus, >+ uint8_t device, uint8_t function) >+{"domain" needs to be "uint16_t". Also, just to double check: we don''t currently expose the option of MMCONFIG to the guest (as otherwise the change would be incomplete)? Jan
>>> Julien Grall <julien.grall@citrix.com> 08/22/12 8:56 PM >>> >@@ -4069,20 +4053,12 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE(void) arg)>>switch ( a.index ) >{ >- case HVM_PARAM_IOREQ_PFN:Removing sub-ops which a domain can issue for itself (which for this and another one below appears to be the case) is not allowed.>+ case HVM_PARAM_IO_PFN_FIRST:I don''t see where in this patch this and the other new sub-op constants get defined. Jan
On 08/23/2012 08:27 AM, Jan Beulich wrote:>> switch ( a.index ) >> { >> - case HVM_PARAM_IOREQ_PFN: >> > Removing sub-ops which a domain can issue for itself (which for this and > another one below appears to be the case) is not allowed. >I removed these 3 sub-ops because it will not work with QEMU disaggregation. Shared pages and event channel for IO request are private for each device model.>> + case HVM_PARAM_IO_PFN_FIRST: >> > I don''t see where in this patch this and the other new sub-op constants > get defined. >Both sub-op constants are added in patch 1: http://lists.xen.org/archives/html/xen-devel/2012-08/msg01767.html
Ian Campbell
2012-Aug-23 13:18 UTC
Re: [XEN][RFC PATCH V2 01/17] hvm: Modify interface to support multiple ioreq server
> diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h > index 27b3de5..49d1ca0 100644 > --- a/xen/include/asm-x86/hvm/domain.h > +++ b/xen/include/asm-x86/hvm/domain.h[...]> struct hvm_domain { > + /* Use for the IO handles by Xen */ > struct hvm_ioreq_page ioreq; > - struct hvm_ioreq_page buf_ioreq; > + struct hvm_ioreq_server *ioreq_server_list; > + uint32_t nr_ioreq_server; > + spinlock_t ioreq_server_lock;There''s some whitespace weirdness here plus some in xen/include/asm-x86/hvm/vcpu.h and xen/include/public/hvm/hvm_op.h.> diff --git a/xen/include/public/hvm/ioreq.h b/xen/include/public/hvm/ioreq.h > index 4022a1d..87aacd3 100644 > --- a/xen/include/public/hvm/ioreq.h > +++ b/xen/include/public/hvm/ioreq.h > @@ -34,6 +34,7 @@ > > #define IOREQ_TYPE_PIO 0 /* pio */ > #define IOREQ_TYPE_COPY 1 /* mmio ops */ > +#define IOREQ_TYPE_PCI_CONFIG 2 /* pci config space ops */ > #define IOREQ_TYPE_TIMEOFFSET 7 > #define IOREQ_TYPE_INVALIDATE 8 /* mapcache */I wonder why we skip 2-6 now -- perhaps they used to be something else and we are avoiding them to avoid strange errors? In which case adding the new on as 9 might be a good idea. Ian.
Ian Campbell
2012-Aug-23 13:21 UTC
Re: [XEN][RFC PATCH V2 09/17] xc: Add the hypercall for multiple servers
On Wed, 2012-08-22 at 13:31 +0100, Julien Grall wrote:> This patch add 5 hypercalls to register server, io range and PCI. > > Signed-off-by: Julien Grall <julien.grall@citrix.com>Looks correct to me at least so far as the use of the hypercall buffers goes, thanks. Acked-by: Ian Campbell <ian.campbelL@citrix.com>
Ian Campbell
2012-Aug-23 13:27 UTC
Re: [XEN][RFC PATCH V2 11/17] xc: modify save/restore to support multiple device models
On Wed, 2012-08-22 at 13:31 +0100, Julien Grall wrote:> - add save/restore new special pages and remove unused > - modify save file structure to allow multiple qemu states > > Signed-off-by: Julien Grall <julien.grall@citrix.com> > --- > tools/libxc/xc_domain_restore.c | 150 +++++++++++++++++++++++++++++---------- > tools/libxc/xc_domain_save.c | 6 +-As you''ve changed the protocol olease can you update the docs in xg_save_restore.h.> @@ -103,6 +103,9 @@ static ssize_t rdexact(xc_interface *xch, struct restore_ctx *ctx, > #else > #define RDEXACT read_exact > #endif > + > +#define QEMUSIG_SIZE 21 > + > /* > ** In the state file (or during transfer), all page-table pages are > ** converted into a ''canonical'' form where references to actual mfns > @@ -467,7 +522,7 @@ static int buffer_tail_hvm(xc_interface *xch, struct restore_ctx *ctx, > int vcpuextstate, uint32_t vcpuextstate_size) > { > uint8_t *tmp; > - unsigned char qemusig[21]; > + unsigned char qemusig[QEMUSIG_SIZE + 1];An extra + 1 here? [...]> - qemusig[20] = ''\0''; > + qemusig[QEMUSIG_SIZE] = ''\0'';This is one bigger than it used to be now. Perhaps this is an unrelated bug fix (I haven''t check the real length of the sig), in which case please can you split it out and submit separately? Ian.
Ian Campbell
2012-Aug-23 13:30 UTC
Re: [XEN][RFC PATCH V2 12/17] xl: Add interface to handle qemu disaggregation
On Wed, 2012-08-22 at 13:31 +0100, Julien Grall wrote:> This patch modifies libxl interface for qemu disaggregation.I''d rather see the interfaces changes in the same patch as the implementation of the new interfaces.> For the moment, due to some dependencies between devices, we > can''t let the user choose which QEMU emulate a device. > > Moreoever this patch adds an "id" field to nic interface. > It will be used in config file to specify which QEMU handle > the network card.Is domid+devid not sufficient to identify which nic?> A possible disaggregation is: > - UI: Emulate graphic card, USB, keyboard, mouse, default devices > (PIIX4, root bridge, ...) > - IDE: Emulate disk > - Serial: Emulate serial port > - Audio: Emulate audio card > - Net: Emulate one or more network cards, multiple QEMU can emulate > different card. The emulated card is specified with its nic ID. > > Signed-off-by: Julien Grall <julien.grall@citrix.com> > --- > tools/libxl/libxl.h | 3 +++ > tools/libxl/libxl_types.idl | 15 +++++++++++++++ > 2 files changed, 18 insertions(+), 0 deletions(-) > > diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h > index c614d6f..71d4808 100644 > --- a/tools/libxl/libxl.h > +++ b/tools/libxl/libxl.h > @@ -307,6 +307,7 @@ void libxl_cpuid_dispose(libxl_cpuid_policy_list *cpuid_list); > #define LIBXL_PCI_FUNC_ALL (~0U) > > typedef uint32_t libxl_domid; > +typedef uint32_t libxl_dmid; > > /* > * Formatting Enumerations. > @@ -478,12 +479,14 @@ typedef struct { > libxl_domain_build_info b_info; > > int num_disks, num_nics, num_pcidevs, num_vfbs, num_vkbs; > + int num_dms; > > libxl_device_disk *disks; > libxl_device_nic *nics; > libxl_device_pci *pcidevs; > libxl_device_vfb *vfbs; > libxl_device_vkb *vkbs; > + libxl_dm *dms; > > libxl_action_on_shutdown on_poweroff; > libxl_action_on_shutdown on_reboot; > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl > index daa8c79..36c802a 100644 > --- a/tools/libxl/libxl_types.idl > +++ b/tools/libxl/libxl_types.idl > @@ -246,6 +246,20 @@ libxl_domain_sched_params = Struct("domain_sched_params",[ > ("extratime", integer, {''init_val'': ''LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT''}), > ]) > > +libxl_dm_cap = Enumeration("dm_cap", [ > + (1, "UI"), # Emulate all UI + default deviceWhat does "default device" equate too?> + (2, "IDE"), # Emulate IDE > + (4, "SERIAL"), # Emulate Serial > + (8, "AUDIO"), # Emulate audio > + ]) > + > +libxl_dm = Struct("dm", [ > + ("name", string), > + ("path", string), > + ("capabilities", uint64),uint64 and not libxl_dm_cap?> + ("vifs", libxl_string_list), > + ]) > + > libxl_domain_build_info = Struct("domain_build_info",[ > ("max_vcpus", integer), > ("avail_vcpus", libxl_bitmap), > @@ -367,6 +381,7 @@ libxl_device_nic = Struct("device_nic", [ > ("nictype", libxl_nic_type), > ("rate_bytes_per_interval", uint64), > ("rate_interval_usecs", uint32), > + ("id", string), > ]) > > libxl_device_pci = Struct("device_pci", [
Ian Campbell
2012-Aug-23 13:35 UTC
Re: [XEN][RFC PATCH V2 14/17] xl-parsing: Parse new device_models option
On Wed, 2012-08-22 at 13:32 +0100, Julien Grall wrote:> Add new option "device_models". The user can specify the capability of the > QEMU (ui, vifs, ...). This option only works with QEMU upstream (qemu-xen). > > For instance: > device_models= [ ''name=all,vifs=nic1'', ''name=qvga,ui'', ''name=qide,ide'' ]Please can you patch docs/man/xl.cfg.pod.5 with a description of this syntax. Possibly just a stub referencing docs/man/xl-device-models.markdown in the same manner as xl-disk-configuration.txt, xl-numa-placement.markdown, xl-network-configuration.markdown etc. iirc you can give multiple vifs -- what does that syntax look like? I didn''t ask before -- what does naming the dm give you? Is it just used for ui things like logging or can you cross reference this in some way?> Each device model can also take a path argument which override the default one. > It''s usefull for debugging.useful> > Signed-off-by: Julien Grall <julien.grall@citrix.com> > --- > tools/libxl/Makefile | 2 +- > tools/libxl/libxlu_dm.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++ > tools/libxl/libxlutil.h | 5 ++ > tools/libxl/xl_cmdimpl.c | 29 +++++++++++++- > 4 files changed, 130 insertions(+), 2 deletions(-) > create mode 100644 tools/libxl/libxlu_dm.c > > diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile > index 47fb110..2b58721 100644 > --- a/tools/libxl/Makefile > +++ b/tools/libxl/Makefile > @@ -79,7 +79,7 @@ AUTOINCS= libxlu_cfg_y.h libxlu_cfg_l.h _libxl_list.h _paths.h \ > AUTOSRCS= libxlu_cfg_y.c libxlu_cfg_l.c > AUTOSRCS += _libxl_save_msgs_callout.c _libxl_save_msgs_helper.c > LIBXLU_OBJS = libxlu_cfg_y.o libxlu_cfg_l.o libxlu_cfg.o \ > - libxlu_disk_l.o libxlu_disk.o libxlu_vif.o libxlu_pci.o > + libxlu_disk_l.o libxlu_disk.o libxlu_vif.o libxlu_pci.o libxlu_dm.o > $(LIBXLU_OBJS): CFLAGS += $(CFLAGS_libxenctrl) # For xentoollog.h > > CLIENTS = xl testidl libxl-save-helper > diff --git a/tools/libxl/libxlu_dm.c b/tools/libxl/libxlu_dm.c > new file mode 100644 > index 0000000..9f0a347 > --- /dev/null > +++ b/tools/libxl/libxlu_dm.c > @@ -0,0 +1,96 @@ > +#include "libxl_osdeps.h" /* must come before any other headers */ > +#include <stdlib.h> > +#include "libxlu_internal.h" > +#include "libxlu_cfg_i.h" > + > +static void split_string_into_string_list(const char *str, > + const char *delim, > + libxl_string_list *psl)Is this a cut-n-paste of the one in xl_cmdimpl.c or did it change? Probably better to add this as a common utility function somewhere.> +{ > [...] > +} > + > +int xlu_dm_parse(XLU_Config *cfg, const char *spec, > + libxl_dm *dm) > +{ > + char *buf = strdup(spec); > + char *p, *p2; > + int rc = 0; > + > + p = strtok(buf, ","); > + if (!p) > + goto skip_dm; > + do { > + while (*p == '' '') > + p++; > + if ((p2 = strchr(p, ''='')) == NULL) { > + if (!strcmp(p, "ui"))libxl provides a libxl_BLAH_from_string for enums in the idl, which might be helpful here?> + dm->capabilities |= LIBXL_DM_CAP_UI; > + else if (!strcmp(p, "ide")) > + dm->capabilities |= LIBXL_DM_CAP_IDE; > + else if (!strcmp(p, "serial")) > + dm->capabilities |= LIBXL_DM_CAP_SERIAL; > + else if (!strcmp(p, "audio")) > + dm->capabilities |= LIBXL_DM_CAP_AUDIO; > + } else { > + *p2 = ''\0''; > + if (!strcmp(p, "name")) > + dm->name = strdup(p2 + 1); > + else if (!strcmp(p, "path")) > + dm->path = strdup(p2 + 1); > + else if (!strcmp(p, "vifs")) > + split_string_into_string_list(p2 + 1, ";", &dm->vifs); > + } > + } while ((p = strtok(NULL, ",")) != NULL); > + > + if (!dm->name && dm->path) > + { > + fprintf(stderr, "xl: Unable to parse device_deamon\n"); > + exit(-ERROR_FAIL); > + } > +skip_dm: > + free(buf); > + > + return rc; > +} > diff --git a/tools/libxl/libxlutil.h b/tools/libxl/libxlutil.h > index 0333e55..db22715 100644 > --- a/tools/libxl/libxlutil.h > +++ b/tools/libxl/libxlutil.h > @@ -93,6 +93,11 @@ int xlu_disk_parse(XLU_Config *cfg, int nspecs, const char *const *specs, > */ > int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char *str); > > +/* > + * Daemon specification parsing. > + */ > +int xlu_dm_parse(XLU_Config *cfg, const char *spec, > + libxl_dm *dm); > > /* > * Vif rate parsing. > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c > index 138cd72..2a26fa4 100644 > --- a/tools/libxl/xl_cmdimpl.c > +++ b/tools/libxl/xl_cmdimpl.c > @@ -561,7 +561,7 @@ static void parse_config_data(const char *config_source, > const char *buf; > long l; > XLU_Config *config; > - XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids; > + XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *dms; > int pci_power_mgmt = 0; > int pci_msitranslate = 0; > int pci_permissive = 0; > @@ -995,6 +995,9 @@ static void parse_config_data(const char *config_source, > } else if (!strcmp(p, "vifname")) { > free(nic->ifname); > nic->ifname = strdup(p2 + 1); > + } else if (!strcmp(p, "id")) { > + free(nic->id); > + nic->id = strdup(p2 + 1); > } else if (!strcmp(p, "backend")) { > if(libxl_name_to_domid(ctx, (p2 + 1), &(nic->backend_domid))) { > fprintf(stderr, "Specified backend domain does not exist, defaulting to Dom0\n"); > @@ -1249,6 +1252,30 @@ skip_vfb: > } > } > } > + > + d_config->num_dms = 0; > + d_config->dms = NULL; > + > + if (b_info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN > + && !xlu_cfg_get_list (config, "device_models", &dms, 0, 0)) { > + while ((buf = xlu_cfg_get_listitem (dms, d_config->num_dms)) != NULL) { > + libxl_dm *dm; > + size_t size = sizeof (libxl_dm) * (d_config->num_dms + 1); > + > + d_config->dms = (libxl_dm *)realloc (d_config->dms, size); > + if (!d_config->dms) { > + fprintf(stderr, "Can''t realloc d_config->dms\n"); > + exit (1); > + } > + dm = d_config->dms + d_config->num_dms; > + libxl_dm_init (dm); > + if (xlu_dm_parse(config, buf, dm)) { > + exit (-ERROR_FAIL); > + } > + d_config->num_dms++; > + } > + } > + > #define parse_extra_args(type) \ > e = xlu_cfg_get_list_as_string_list(config, "device_model_args"#type, \ > &b_info->extra##type, 0); \
Ian Campbell
2012-Aug-23 13:56 UTC
Re: [XEN][RFC PATCH V2 15/17] xl: support spawn/destroy on multiple device model
On Wed, 2012-08-22 at 13:32 +0100, Julien Grall wrote:> Old configuration file is still working with qemu disaggregation. > Before to spawn any QEMU, the toolstack will fill correctly, if needed, > configuration structure. > > For the moment, the toolstack spawns device models one by one. > > Signed-off-by: Julien Grall <julien.grall@citrix.com> > --- > tools/libxl/libxl.c | 16 ++- > tools/libxl/libxl_create.c | 150 +++++++++++++----- > tools/libxl/libxl_device.c | 7 +- > tools/libxl/libxl_dm.c | 369 ++++++++++++++++++++++++++++++------------ > tools/libxl/libxl_dom.c | 4 +- > tools/libxl/libxl_internal.h | 36 +++-- > 6 files changed, 421 insertions(+), 161 deletions(-) > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > index 8ea3478..60718b6 100644 > --- a/tools/libxl/libxl.c > +++ b/tools/libxl/libxl.c > @@ -1330,7 +1330,8 @@ static void stubdom_destroy_callback(libxl__egc *egc, > } > > dds->stubdom_finished = 1; > - savefile = libxl__device_model_savefile(gc, dis->domid); > + /* FIXME: get dmid */ > + savefile = libxl__device_model_savefile(gc, dis->domid, 0); > rc = libxl__remove_file(gc, savefile); > /* > * On suspend libxl__domain_save_device_model will have already > @@ -1423,10 +1424,8 @@ void libxl__destroy_domid(libxl__egc *egc, libxl__destroy_domid_state *dis) > LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_pause failed for %d", domid); > } > if (dm_present) { > - if (libxl__destroy_device_model(gc, domid) < 0) > - LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl__destroy_device_model failed for %d", domid); > - > - libxl__qmp_cleanup(gc, domid); > + if (libxl__destroy_device_models(gc, domid) < 0) > + LIBXL__LOG(ctx, LIBXL__LOG_ERROR, "libxl__destroy_device_models failed for %d", domid); > } > dis->drs.ao = ao; > dis->drs.domid = domid; > @@ -1725,6 +1724,13 @@ out: > > /******************************************************************************/ > > +int libxl__dm_setdefault(libxl__gc *gc, libxl_dm *dm) > +{ > + return 0; > +} > + > +/******************************************************************************/ > + > int libxl__device_disk_setdefault(libxl__gc *gc, libxl_device_disk *disk) > { > int rc; > diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c > index 5f0d26f..7160c78 100644 > --- a/tools/libxl/libxl_create.c > +++ b/tools/libxl/libxl_create.c > @@ -35,6 +35,10 @@ void libxl_domain_config_dispose(libxl_domain_config *d_config) > { > int i; > > + for (i=0; i<d_config->num_dms; i++) > + libxl_dm_dispose(&d_config->dms[i]); > + free(d_config->dms);We are adding libxl_FOO_list_free functions for new ones of these as we introduce new ones), can you do that for the dm type please.> + > for (i=0; i<d_config->num_disks; i++) > libxl_device_disk_dispose(&d_config->disks[i]); > free(d_config->disks); > @@ -59,6 +63,50 @@ void libxl_domain_config_dispose(libxl_domain_config *d_config) > libxl_domain_build_info_dispose(&d_config->b_info); > } > > +static int libxl__domain_config_setdefault(libxl__gc *gc, > + libxl_domain_config *d_config) > +{ > + libxl_domain_build_info *b_info = &d_config->b_info; > + uint64_t cap = 0; > + int i = 0; > + int ret = 0; > + libxl_dm *default_dm = NULL; > + > + if (b_info->device_model_version == LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL > + && (d_config->num_dms > 1)) > + return ERROR_INVAL; > + > + if (!d_config->num_dms) { > + d_config->dms = malloc(sizeof (*d_config->dms));You should use libxl__zalloc or libxl__calloc or something here with the NO_GC argument to get the expected error handling.> @@ -991,12 +1057,11 @@ static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev, > libxl__device_console_dispose(&console); > > if (need_qemu) { > - dcs->dmss.dm.guest_domid = domid; > - libxl__spawn_local_dm(egc, &dcs->dmss.dm); > + assert(dcs->dmss); > + domcreate_spawn_devmodel(egc, dcs, dcs->current_dmid); > return; > } else { > - assert(!dcs->dmss.dm.guest_domid); > - domcreate_devmodel_started(egc, &dcs->dmss.dm, 0); > + assert(!dcs->dmss);Doesn''t this stop progress in this case meaning we''ll never get to the end of the async op?> return; > } > }[..]> @@ -1044,7 +1044,8 @@ int libxl__wait_for_device_model(libxl__gc *gc, > void *check_callback_userdata) > { > char *path; > - path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state", domid); > + path = libxl__sprintf(gc, "/local/domain/0/dms/%u/%u/state", > + domid, dmid);Isn''t this control path shared with qemu? I''m not sure we can just change it like that? We need to at least retain compatibility with pre-disag qemus.> return libxl__wait_for_offspring(gc, domid, > LIBXL_DEVICE_MODEL_START_TIMEOUT, > "Device Model", path, state, spawning,> const char *libxl__domain_device_model(libxl__gc *gc, > - const libxl_domain_build_info *info) > + uint32_t dmid, > + const libxl_domain_build_info *b_info) > { > libxl_ctx *ctx = libxl__gc_owner(gc); > const char *dm; > + libxl_domain_config *guest_config = CONTAINER_OF(b_info, *guest_config, > + b_info); > > - if (libxl_defbool_val(info->device_model_stubdomain)) > + if (libxl_defbool_val(guest_config->b_info.device_model_stubdomain))You just extracted guest_config from b_info but you still have the b_info point to hand. Why not use it? Likewise a few more times below.> { > + /**The ** implies some sort of automagic comments->doc parsing process which we don''t have here.> + * PCI device number. Before 3, we have IDE, ISA, SouthBridge and > + * XEN PCI. Theses devices will be emulate in each QEMU, but only > + * one QEMU (the one which emulates default device) will register > + * these devices through Xen PCI hypercall. > + */ > + static unsigned int bdf = 3;Do you mean const rather than static? Isn''t this baking in some implementation detail from the current qemu version? What happens if it changes?> + > libxl_ctx *ctx = libxl__gc_owner(gc); > const libxl_domain_create_info *c_info = &guest_config->c_info; > const libxl_domain_build_info *b_info = &guest_config->b_info; > + const libxl_dm *dm_config = &guest_config->dms[dmid]; > const libxl_device_disk *disks = guest_config->disks; > const libxl_device_nic *nics = guest_config->nics; > const int num_disks = guest_config->num_disks; > const int num_nics = guest_config->num_nics; > - const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config); > + const libxl_vnc_info *vnc = libxl__dm_vnc(dmid, guest_config); > const libxl_sdl_info *sdl = dm_sdl(guest_config); > const char *keymap = dm_keymap(guest_config); > flexarray_t *dm_args; > int i; > uint64_t ram_size; > + uint32_t cap_ui = dm_config->capabilities & LIBXL_DM_CAP_UI; > + uint32_t cap_ide = dm_config->capabilities & LIBXL_DM_CAP_IDE; > + uint32_t cap_serial = dm_config->capabilities & LIBXL_DM_CAP_SERIAL; > + uint32_t cap_audio = dm_config->capabilities & LIBXL_DM_CAP_AUDIO;->capabilities is defined as 64 bits, but you use 32 here, which happens to work if you know what the actual values of the enum are but whoever adds the 33rd capability will probably get it wrong. bool cap_foo = !! (dm_....capabiltieis & LIBXL_DM_CAP_FOO) would probably work?> > dm_args = flexarray_make(16, 1); > if (!dm_args) > @@ -348,11 +389,12 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, > "-xen-domid", > libxl__sprintf(gc, "%d", guest_domid), NULL); > > + flexarray_append(dm_args, "-nodefaults");Does this not cause a change in behaviour other than what you''ve accounted for here?> @@ -528,65 +583,69 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, > abort(); > } > > - ram_size = libxl__sizekb_to_mb(b_info->max_memkb - b_info->video_memkb); > + // Allocate ram space of 32Mo per previous device model to store romWhat is this about? (also that Mo looks a bit odd in among all these mb''s) Ian.
Julien Grall
2012-Aug-23 19:13 UTC
Re: [XEN][RFC PATCH V2 11/17] xc: modify save/restore to support multiple device models
On 08/23/2012 02:27 PM, Ian Campbell wrote:> >> @@ -103,6 +103,9 @@ static ssize_t rdexact(xc_interface *xch, struct restore_ctx *ctx, >> #else >> #define RDEXACT read_exact >> #endif >> + >> +#define QEMUSIG_SIZE 21 >> + >> /* >> ** In the state file (or during transfer), all page-table pages are >> ** converted into a ''canonical'' form where references to actual mfns >> @@ -467,7 +522,7 @@ static int buffer_tail_hvm(xc_interface *xch, struct restore_ctx *ctx, >> int vcpuextstate, uint32_t vcpuextstate_size) >> { >> uint8_t *tmp; >> - unsigned char qemusig[21]; >> + unsigned char qemusig[QEMUSIG_SIZE + 1]; >> > An extra + 1 here? >QEMUSIG_SIZE doesn''t take into account the ''\0''. So we need to add 1. If an error occurred, without +1, the output log lost the last character.> [...] > >> - qemusig[20] = ''\0''; >> + qemusig[QEMUSIG_SIZE] = ''\0''; >> > This is one bigger than it used to be now. > > Perhaps this is an unrelated bug fix (I haven''t check the real length of > the sig), in which case please can you split it out and submit > separately? >#define QEMU_SIGNATURE "DeviceModelRecord0002" Just checked, the length seems to be 21. I will send a patch with this change. -- Julien
Ian Campbell
2012-Aug-23 19:52 UTC
Re: [XEN][RFC PATCH V2 11/17] xc: modify save/restore to support multiple device models
On Thu, 2012-08-23 at 20:13 +0100, Julien Grall wrote:> On 08/23/2012 02:27 PM, Ian Campbell wrote: > > > >> @@ -103,6 +103,9 @@ static ssize_t rdexact(xc_interface *xch, struct restore_ctx *ctx, > >> #else > >> #define RDEXACT read_exact > >> #endif > >> + > >> +#define QEMUSIG_SIZE 21 > >> + > >> /* > >> ** In the state file (or during transfer), all page-table pages are > >> ** converted into a ''canonical'' form where references to actual mfns > >> @@ -467,7 +522,7 @@ static int buffer_tail_hvm(xc_interface *xch, struct restore_ctx *ctx, > >> int vcpuextstate, uint32_t vcpuextstate_size) > >> { > >> uint8_t *tmp; > >> - unsigned char qemusig[21]; > >> + unsigned char qemusig[QEMUSIG_SIZE + 1]; > >> > > An extra + 1 here? > > > QEMUSIG_SIZE doesn''t take into account the ''\0''. So we need to add 1. > If an error occurred, without +1, the output log lost the last character.So this is just a bug fix for a pre-existing issue?> > [...] > > > >> - qemusig[20] = ''\0''; > >> + qemusig[QEMUSIG_SIZE] = ''\0''; > >> > > This is one bigger than it used to be now. > > > > Perhaps this is an unrelated bug fix (I haven''t check the real length of > > the sig), in which case please can you split it out and submit > > separately? > > > > #define QEMU_SIGNATURE "DeviceModelRecord0002" > Just checked, the length seems to be 21. I will send a patch with > this change.Perhaps use either sizeof(QEMU_SIGNATURE) or strlen(QEMU_SIGNATURE) (depending on which semantics you want)? Ian.
Julien Grall
2012-Aug-24 10:27 UTC
Re: [XEN][RFC PATCH V2 11/17] xc: modify save/restore to support multiple device models
On 08/23/2012 08:52 PM, Ian Campbell wrote:> On Thu, 2012-08-23 at 20:13 +0100, Julien Grall wrote: > >> On 08/23/2012 02:27 PM, Ian Campbell wrote: >> >>> >>>> @@ -103,6 +103,9 @@ static ssize_t rdexact(xc_interface *xch, struct restore_ctx *ctx, >>>> #else >>>> #define RDEXACT read_exact >>>> #endif >>>> + >>>> +#define QEMUSIG_SIZE 21 >>>> + >>>> /* >>>> ** In the state file (or during transfer), all page-table pages are >>>> ** converted into a ''canonical'' form where references to actual mfns >>>> @@ -467,7 +522,7 @@ static int buffer_tail_hvm(xc_interface *xch, struct restore_ctx *ctx, >>>> int vcpuextstate, uint32_t vcpuextstate_size) >>>> { >>>> uint8_t *tmp; >>>> - unsigned char qemusig[21]; >>>> + unsigned char qemusig[QEMUSIG_SIZE + 1]; >>>> >>>> >>> An extra + 1 here? >>> >>> >> QEMUSIG_SIZE doesn''t take into account the ''\0''. So we need to add 1. >> If an error occurred, without +1, the output log lost the last character. >> > So this is just a bug fix for a pre-existing issue? >Yes.>>> [...] >>> >>> >>>> - qemusig[20] = ''\0''; >>>> + qemusig[QEMUSIG_SIZE] = ''\0''; >>>> >>>> >>> This is one bigger than it used to be now. >>> >>> Perhaps this is an unrelated bug fix (I haven''t check the real length of >>> the sig), in which case please can you split it out and submit >>> separately? >>> >>> >> #define QEMU_SIGNATURE "DeviceModelRecord0002" >> Just checked, the length seems to be 21. I will send a patch with >> this change. >> > Perhaps use either sizeof(QEMU_SIGNATURE) or strlen(QEMU_SIGNATURE) > (depending on which semantics you want)? >Here, QEMU_SIZE needs to be define as strlen (QEMU_SIGNATURE), but QEMU_SIGNATURE is not defined in libxc. It''s defined in libxl/libxl_internal.h. By the way, I''m wondering why QEMU save (libxl__domain_save_device_model) is made in libxl and restore (dump_qemu) in libxc ?
Ian Campbell
2012-Aug-24 10:35 UTC
Re: [XEN][RFC PATCH V2 11/17] xc: modify save/restore to support multiple device models
On Fri, 2012-08-24 at 11:27 +0100, Julien Grall wrote:> On 08/23/2012 08:52 PM, Ian Campbell wrote: > > On Thu, 2012-08-23 at 20:13 +0100, Julien Grall wrote: > > > >> On 08/23/2012 02:27 PM, Ian Campbell wrote: > >> > >>> > >>>> @@ -103,6 +103,9 @@ static ssize_t rdexact(xc_interface *xch, struct restore_ctx *ctx, > >>>> #else > >>>> #define RDEXACT read_exact > >>>> #endif > >>>> + > >>>> +#define QEMUSIG_SIZE 21 > >>>> + > >>>> /* > >>>> ** In the state file (or during transfer), all page-table pages are > >>>> ** converted into a ''canonical'' form where references to actual mfns > >>>> @@ -467,7 +522,7 @@ static int buffer_tail_hvm(xc_interface *xch, struct restore_ctx *ctx, > >>>> int vcpuextstate, uint32_t vcpuextstate_size) > >>>> { > >>>> uint8_t *tmp; > >>>> - unsigned char qemusig[21]; > >>>> + unsigned char qemusig[QEMUSIG_SIZE + 1]; > >>>> > >>>> > >>> An extra + 1 here? > >>> > >>> > >> QEMUSIG_SIZE doesn''t take into account the ''\0''. So we need to add 1. > >> If an error occurred, without +1, the output log lost the last character. > >> > > So this is just a bug fix for a pre-existing issue? > > > Yes.Can we get it as a separate change?> > >>> [...] > >>> > >>> > >>>> - qemusig[20] = ''\0''; > >>>> + qemusig[QEMUSIG_SIZE] = ''\0''; > >>>> > >>>> > >>> This is one bigger than it used to be now. > >>> > >>> Perhaps this is an unrelated bug fix (I haven''t check the real length of > >>> the sig), in which case please can you split it out and submit > >>> separately? > >>> > >>> > >> #define QEMU_SIGNATURE "DeviceModelRecord0002" > >> Just checked, the length seems to be 21. I will send a patch with > >> this change. > >> > > Perhaps use either sizeof(QEMU_SIGNATURE) or strlen(QEMU_SIGNATURE) > > (depending on which semantics you want)? > > > Here, QEMU_SIZE needs to be define as strlen (QEMU_SIGNATURE), > but QEMU_SIGNATURE is not defined in libxc. It''s defined > in libxl/libxl_internal.h.Oh, right, this again :-/> By the way, I''m wondering why QEMU save (libxl__domain_save_device_model) > is made in libxl and restore (dump_qemu) in libxc ?Mostly historical accident, we''d really like to sort this out one way or the other but untangling the protocol and the callbacks etc is a pretty big job. In the meantime perhaps libxc could provide a suitable "typedef char device_model_signature_t[21]"? Ian.
Julien Grall
2012-Aug-24 12:56 UTC
Re: [XEN][RFC PATCH V2 12/17] xl: Add interface to handle qemu disaggregation
On 08/23/2012 02:30 PM, Ian Campbell wrote:> On Wed, 2012-08-22 at 13:31 +0100, Julien Grall wrote: > >> This patch modifies libxl interface for qemu disaggregation. >> > I''d rather see the interfaces changes in the same patch as the > implementation of the new interfaces. > > >> For the moment, due to some dependencies between devices, we >> can''t let the user choose which QEMU emulate a device. >> >> Moreoever this patch adds an "id" field to nic interface. >> It will be used in config file to specify which QEMU handle >> the network card. >> > Is domid+devid not sufficient to identify which nic? >Is the user can specify or find devid easily ? I added "id" because, I would like that the user can identify without any problem a network interface.>> A possible disaggregation is: >> - UI: Emulate graphic card, USB, keyboard, mouse, default devices >> (PIIX4, root bridge, ...) >> - IDE: Emulate disk >> - Serial: Emulate serial port >> - Audio: Emulate audio card >> - Net: Emulate one or more network cards, multiple QEMU can emulate >> different card. The emulated card is specified with its nic ID. >> >> Signed-off-by: Julien Grall<julien.grall@citrix.com> >> --- >> tools/libxl/libxl.h | 3 +++ >> tools/libxl/libxl_types.idl | 15 +++++++++++++++ >> 2 files changed, 18 insertions(+), 0 deletions(-) >> >> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h >> index c614d6f..71d4808 100644 >> --- a/tools/libxl/libxl.h >> +++ b/tools/libxl/libxl.h >> @@ -307,6 +307,7 @@ void libxl_cpuid_dispose(libxl_cpuid_policy_list *cpuid_list); >> #define LIBXL_PCI_FUNC_ALL (~0U) >> >> typedef uint32_t libxl_domid; >> +typedef uint32_t libxl_dmid; >> >> /* >> * Formatting Enumerations. >> @@ -478,12 +479,14 @@ typedef struct { >> libxl_domain_build_info b_info; >> >> int num_disks, num_nics, num_pcidevs, num_vfbs, num_vkbs; >> + int num_dms; >> >> libxl_device_disk *disks; >> libxl_device_nic *nics; >> libxl_device_pci *pcidevs; >> libxl_device_vfb *vfbs; >> libxl_device_vkb *vkbs; >> + libxl_dm *dms; >> >> libxl_action_on_shutdown on_poweroff; >> libxl_action_on_shutdown on_reboot; >> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl >> index daa8c79..36c802a 100644 >> --- a/tools/libxl/libxl_types.idl >> +++ b/tools/libxl/libxl_types.idl >> @@ -246,6 +246,20 @@ libxl_domain_sched_params = Struct("domain_sched_params",[ >> ("extratime", integer, {''init_val'': ''LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT''}), >> ]) >> >> +libxl_dm_cap = Enumeration("dm_cap", [ >> + (1, "UI"), # Emulate all UI + default device >> > What does "default device" equate too? >The following devices: - i440fx - piix3 - piix4 - dma - xen apic - xen platform>> + (2, "IDE"), # Emulate IDE >> + (4, "SERIAL"), # Emulate Serial >> + (8, "AUDIO"), # Emulate audio >> + ]) >> + >> +libxl_dm = Struct("dm", [ >> + ("name", string), >> + ("path", string), >> + ("capabilities", uint64), >> > uint64 and not libxl_dm_cap? >Will be fixed in the next patch version. -- Julien
Ian Campbell
2012-Aug-24 13:03 UTC
Re: [XEN][RFC PATCH V2 12/17] xl: Add interface to handle qemu disaggregation
On Fri, 2012-08-24 at 13:56 +0100, Julien Grall wrote:> On 08/23/2012 02:30 PM, Ian Campbell wrote: > > On Wed, 2012-08-22 at 13:31 +0100, Julien Grall wrote: > > > >> This patch modifies libxl interface for qemu disaggregation. > >> > > I''d rather see the interfaces changes in the same patch as the > > implementation of the new interfaces. > > > > > >> For the moment, due to some dependencies between devices, we > >> can''t let the user choose which QEMU emulate a device. > >> > >> Moreoever this patch adds an "id" field to nic interface. > >> It will be used in config file to specify which QEMU handle > >> the network card. > >> > > Is domid+devid not sufficient to identify which nic? > > > Is the user can specify or find devid easily ? > I added "id" because, I would like that the user > can identify without any problem a network > interface.At the libxl level the libxl_device_nic struct has a devid in it. That''s not to say that xl can''t add a layer of naming and indirection on top.> >> @@ -246,6 +246,20 @@ libxl_domain_sched_params = Struct("domain_sched_params",[ > >> ("extratime", integer, {''init_val'': ''LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT''}), > >> ]) > >> > >> +libxl_dm_cap = Enumeration("dm_cap", [ > >> + (1, "UI"), # Emulate all UI + default device > >> > > What does "default device" equate too? > > > The following devices: > - i440fx > - piix3 > - piix4 > - dma > - xen apic > - xen platformSo this is more like "CORE" than "UI"? Is there a reason why UI (which I guess means the VGA, spice and VFB devices?) are required to be in the same emulator as these?> >> + (2, "IDE"), # Emulate IDE > >> + (4, "SERIAL"), # Emulate Serial > >> + (8, "AUDIO"), # Emulate audio > >> + ]) > >> +
Julien Grall
2012-Aug-24 13:12 UTC
Re: [XEN][RFC PATCH V2 14/17] xl-parsing: Parse new device_models option
On 08/23/2012 02:35 PM, Ian Campbell wrote:> On Wed, 2012-08-22 at 13:32 +0100, Julien Grall wrote: > >> Add new option "device_models". The user can specify the capability of the >> QEMU (ui, vifs, ...). This option only works with QEMU upstream (qemu-xen). >> >> For instance: >> device_models= [ ''name=all,vifs=nic1'', ''name=qvga,ui'', ''name=qide,ide'' ] >> > iirc you can give multiple vifs -- what does that syntax look like? > >vifs=nic1;nic2> I didn''t ask before -- what does naming the dm give you? Is it just used > for ui things like logging or can you cross reference this in some way? > >It''s used for logging and in qemu log filename. It''s not a mandatory.>> Signed-off-by: Julien Grall<julien.grall@citrix.com> >> --- >> tools/libxl/Makefile | 2 +- >> tools/libxl/libxlu_dm.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++ >> tools/libxl/libxlutil.h | 5 ++ >> tools/libxl/xl_cmdimpl.c | 29 +++++++++++++- >> 4 files changed, 130 insertions(+), 2 deletions(-) >> create mode 100644 tools/libxl/libxlu_dm.c >> >> diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile >> index 47fb110..2b58721 100644 >> --- a/tools/libxl/Makefile >> +++ b/tools/libxl/Makefile >> @@ -79,7 +79,7 @@ AUTOINCS= libxlu_cfg_y.h libxlu_cfg_l.h _libxl_list.h _paths.h \ >> AUTOSRCS= libxlu_cfg_y.c libxlu_cfg_l.c >> AUTOSRCS += _libxl_save_msgs_callout.c _libxl_save_msgs_helper.c >> LIBXLU_OBJS = libxlu_cfg_y.o libxlu_cfg_l.o libxlu_cfg.o \ >> - libxlu_disk_l.o libxlu_disk.o libxlu_vif.o libxlu_pci.o >> + libxlu_disk_l.o libxlu_disk.o libxlu_vif.o libxlu_pci.o libxlu_dm.o >> $(LIBXLU_OBJS): CFLAGS += $(CFLAGS_libxenctrl) # For xentoollog.h >> >> CLIENTS = xl testidl libxl-save-helper >> diff --git a/tools/libxl/libxlu_dm.c b/tools/libxl/libxlu_dm.c >> new file mode 100644 >> index 0000000..9f0a347 >> --- /dev/null >> +++ b/tools/libxl/libxlu_dm.c >> @@ -0,0 +1,96 @@ >> +#include "libxl_osdeps.h" /* must come before any other headers */ >> +#include<stdlib.h> >> +#include "libxlu_internal.h" >> +#include "libxlu_cfg_i.h" >> + >> +static void split_string_into_string_list(const char *str, >> + const char *delim, >> + libxl_string_list *psl) >> > Is this a cut-n-paste of the one in xl_cmdimpl.c or did it change? > > Probably better to add this as a common utility function somewhere. >It''s nearly the same, except it''s skip blank at the beginning of a value. For instance if we have ''foo; bar'', the function will return [''foo'', ''bar'']. -- Julien
Julien Grall
2012-Aug-24 13:23 UTC
Re: [XEN][RFC PATCH V2 12/17] xl: Add interface to handle qemu disaggregation
On 08/24/2012 02:03 PM, Ian Campbell wrote:> >>>> @@ -246,6 +246,20 @@ libxl_domain_sched_params = Struct("domain_sched_params",[ >>>> ("extratime", integer, {''init_val'': ''LIBXL_DOMAIN_SCHED_PARAM_EXTRATIME_DEFAULT''}), >>>> ]) >>>> >>>> +libxl_dm_cap = Enumeration("dm_cap", [ >>>> + (1, "UI"), # Emulate all UI + default device >>>> >>>> >>> What does "default device" equate too? >>> >>> >> The following devices: >> - i440fx >> - piix3 >> - piix4 >> - dma >> - xen apic >> - xen platform >> > So this is more like "CORE" than "UI"? > > Is there a reason why UI (which I guess means the VGA, spice and VFB > devices?) are required to be in the same emulator as these? > >VGA, keyboard and mouse (that can be plug via USB) need to be in the same emulator. Otherwise we can''t use VNC or something like that. I made this choice, after discussion with Stefano, because theses devices depends each others. For instance, keyboard emulates A20.
Julien Grall
2012-Aug-24 13:51 UTC
Re: [XEN][RFC PATCH V2 15/17] xl: support spawn/destroy on multiple device model
On 08/23/2012 02:56 PM, Ian Campbell wrote:> On Wed, 2012-08-22 at 13:32 +0100, Julien Grall wrote: > >> @@ -991,12 +1057,11 @@ static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev, >> libxl__device_console_dispose(&console); >> >> if (need_qemu) { >> - dcs->dmss.dm.guest_domid = domid; >> - libxl__spawn_local_dm(egc,&dcs->dmss.dm); >> + assert(dcs->dmss); >> + domcreate_spawn_devmodel(egc, dcs, dcs->current_dmid); >> return; >> } else { >> - assert(!dcs->dmss.dm.guest_domid); >> - domcreate_devmodel_started(egc,&dcs->dmss.dm, 0); >> + assert(!dcs->dmss); >> > Doesn''t this stop progress in this case meaning we''ll never get to the > end of the async op? > >Indeed, I will fix on the next patch version.>> return; >> } >> } >> > [..] > >> @@ -1044,7 +1044,8 @@ int libxl__wait_for_device_model(libxl__gc *gc, >> void *check_callback_userdata) >> { >> char *path; >> - path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state", domid); >> + path = libxl__sprintf(gc, "/local/domain/0/dms/%u/%u/state", >> + domid, dmid); >> > Isn''t this control path shared with qemu? I''m not sure we can just > change it like that? We need to at least retain compatibility with > pre-disag qemus. > >Indeed, as we have multiple QEMUs for a same domain, we need to have one control path by QEMU. Pre-disag QEMUs cannot work with my changes inside the Xen. Xen will not forward by default ioreq if there is no ioreq server.>> const char *libxl__domain_device_model(libxl__gc *gc, >> - const libxl_domain_build_info *info) >> + uint32_t dmid, >> + const libxl_domain_build_info *b_info) >> { >> libxl_ctx *ctx = libxl__gc_owner(gc); >> const char *dm; >> + libxl_domain_config *guest_config = CONTAINER_OF(b_info, *guest_config, >> + b_info); >> >> - if (libxl_defbool_val(info->device_model_stubdomain)) >> + if (libxl_defbool_val(guest_config->b_info.device_model_stubdomain)) >> > You just extracted guest_config from b_info but you still have the > b_info point to hand. Why not use it? Likewise a few more times below. >An error, will be fix on next patch version.>> + * PCI device number. Before 3, we have IDE, ISA, SouthBridge and >> + * XEN PCI. Theses devices will be emulate in each QEMU, but only >> + * one QEMU (the one which emulates default device) will register >> + * these devices through Xen PCI hypercall. >> + */ >> + static unsigned int bdf = 3; >> > Do you mean const rather than static? > >No static. With QEMU disaggregation, the toolstack allocate BDF incrementaly. QEMU is unable to know if a BDF is already allocated in another QEMU. For the moment, bdf variable is used to give a devfn for network card and VGA.> Isn''t this baking in some implementation detail from the current qemu > version? What happens if it changes? >I don''t have another way for the moment. I would be happy, if someone have a good solution.>> + >> libxl_ctx *ctx = libxl__gc_owner(gc); >> const libxl_domain_create_info *c_info =&guest_config->c_info; >> const libxl_domain_build_info *b_info =&guest_config->b_info; >> + const libxl_dm *dm_config =&guest_config->dms[dmid]; >> const libxl_device_disk *disks = guest_config->disks; >> const libxl_device_nic *nics = guest_config->nics; >> const int num_disks = guest_config->num_disks; >> const int num_nics = guest_config->num_nics; >> - const libxl_vnc_info *vnc = libxl__dm_vnc(guest_config); >> + const libxl_vnc_info *vnc = libxl__dm_vnc(dmid, guest_config); >> const libxl_sdl_info *sdl = dm_sdl(guest_config); >> const char *keymap = dm_keymap(guest_config); >> flexarray_t *dm_args; >> int i; >> uint64_t ram_size; >> + uint32_t cap_ui = dm_config->capabilities& LIBXL_DM_CAP_UI; >> + uint32_t cap_ide = dm_config->capabilities& LIBXL_DM_CAP_IDE; >> + uint32_t cap_serial = dm_config->capabilities& LIBXL_DM_CAP_SERIAL; >> + uint32_t cap_audio = dm_config->capabilities& LIBXL_DM_CAP_AUDIO; >> > ->capabilities is defined as 64 bits, but you use 32 here, which happens > to work if you know what the actual values of the enum are but whoever > adds the 33rd capability will probably get it wrong. > > bool cap_foo = !! (dm_....capabiltieis& LIBXL_DM_CAP_FOO) > > would probably work? >Indeed, will be fix in next patch version.>> dm_args = flexarray_make(16, 1); >> if (!dm_args) >> @@ -348,11 +389,12 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, >> "-xen-domid", >> libxl__sprintf(gc, "%d", guest_domid), NULL); >> >> + flexarray_append(dm_args, "-nodefaults"); >> > Does this not cause a change in behaviour other than what you''ve > accounted for here? >By default QEMU emulates VGA card, and a network card. This options, disabled it and avoid to add "-net none". I added it after a discussion on my first patch series. https://lists.gnu.org/archive/html/qemu-devel/2012-03/msg04767.html>> @@ -528,65 +583,69 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, >> abort(); >> } >> >> - ram_size = libxl__sizekb_to_mb(b_info->max_memkb - b_info->video_memkb); >> + // Allocate ram space of 32Mo per previous device model to store rom >> > What is this about? > > (also that Mo looks a bit odd in among all these mb''s) > >It''s space for ROM allocation, like vga, rtl8139 roms ... Each QEMU can load ROM and memory, but the memory allocator consider that it''s alone. It starts to allocate ROM space from the end of memory RAM. It''s a solution suggest by Stefano, it''s avoid modification in QEMU. As we don''t know the number of ROM and their size per QEMU, we chose a space of 32 Mo to be sure, but in fine most of time memory is not allocated. -- Julien
Ian Campbell
2012-Aug-24 14:09 UTC
Re: [XEN][RFC PATCH V2 15/17] xl: support spawn/destroy on multiple device model
On Fri, 2012-08-24 at 14:51 +0100, Julien Grall wrote:> >> @@ -1044,7 +1044,8 @@ int libxl__wait_for_device_model(libxl__gc *gc, > >> void *check_callback_userdata) > >> { > >> char *path; > >> - path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state", domid); > >> + path = libxl__sprintf(gc, "/local/domain/0/dms/%u/%u/state", > >> + domid, dmid); > >> > > Isn''t this control path shared with qemu? I''m not sure we can just > > change it like that? We need to at least retain compatibility with > > pre-disag qemus. > > > > > Indeed, as we have multiple QEMUs for a same domain, we need > to have one control path by QEMU. > > Pre-disag QEMUs cannot work with my changes inside the Xen. > Xen will not forward by default ioreq if there is no ioreq server.We might need to consider making disagg an opt in feature, with the default being to have as we do today.> >> + * PCI device number. Before 3, we have IDE, ISA, SouthBridge and > >> + * XEN PCI. Theses devices will be emulate in each QEMU, but only > >> + * one QEMU (the one which emulates default device) will register > >> + * these devices through Xen PCI hypercall. > >> + */ > >> + static unsigned int bdf = 3; > >> > > Do you mean const rather than static? > > > > > No static. With QEMU disaggregation, the toolstack allocate > BDF incrementaly. QEMU is unable to know if a BDF is already > allocated in another QEMU.This is broken if the toolstack is building multiple domains, since the bdf will be preserved across each of them. You need to put this in some sort of data structure specific to this particular iteration of the builder code. We must surely have something suitable close to hand in this function. libxl__domain_build_state perhaps? A static variable in a library is almost always a mistake.> > Isn''t this baking in some implementation detail from the current qemu > > version? What happens if it changes? > > > > I don''t have another way for the moment. I would be happy, > if someone have a good solution.Could we at least make the assignments of the 3 prior BDFs explicit on the command line too?> >> dm_args = flexarray_make(16, 1); > >> if (!dm_args) > >> @@ -348,11 +389,12 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, > >> "-xen-domid", > >> libxl__sprintf(gc, "%d", guest_domid), NULL); > >> > >> + flexarray_append(dm_args, "-nodefaults"); > >> > > Does this not cause a change in behaviour other than what you''ve > > accounted for here? > > > By default QEMU emulates VGA card, and a network card. This options, > disabled it and avoid to add "-net none". > I added it after a discussion on my first patch series. > https://lists.gnu.org/archive/html/qemu-devel/2012-03/msg04767.htmlOK.> >> @@ -528,65 +583,69 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, > >> abort(); > >> } > >> > >> - ram_size = libxl__sizekb_to_mb(b_info->max_memkb - b_info->video_memkb); > >> + // Allocate ram space of 32Mo per previous device model to store rom > >> > > What is this about? > > > > (also that Mo looks a bit odd in among all these mb''s) > > > > > It''s space for ROM allocation, like vga, rtl8139 roms ... > Each QEMU can load ROM and memory, but the memory > allocator consider that it''s alone. It starts to allocate > ROM space from the end of memory RAM. > > It''s a solution suggest by Stefano, it''s avoid modification > in QEMU. As we don''t know the number of ROM and their > size per QEMU, we chose a space of 32 Mo to be sure, but in > fine most of time memory is not allocated."32Mo per previous device model" is the bit which struck me as odd. That means the first device model uses 32Mo, the second 64Mo, the third 96Mo etc? Aren''t we already modifying qemu quite substantially to implement this functionality anyway? so why are we trying to avoid it in this one corner? Especially at the cost of doing something which on the face of it looks quite strange! Isn''t space for the ROMs allocated by SeaBIOS as part of enumerating the PCI bus anyway? Or is this a different per-ROM allocation? Ian.
Julien Grall
2012-Aug-24 14:37 UTC
Re: [XEN][RFC PATCH V2 15/17] xl: support spawn/destroy on multiple device model
On 08/24/2012 03:09 PM, Ian Campbell wrote:> On Fri, 2012-08-24 at 14:51 +0100, Julien Grall wrote: > >>>> @@ -1044,7 +1044,8 @@ int libxl__wait_for_device_model(libxl__gc *gc, >>>> void *check_callback_userdata) >>>> { >>>> char *path; >>>> - path = libxl__sprintf(gc, "/local/domain/0/device-model/%d/state", domid); >>>> + path = libxl__sprintf(gc, "/local/domain/0/dms/%u/%u/state", >>>> + domid, dmid); >>>> >>>> >>> Isn''t this control path shared with qemu? I''m not sure we can just >>> change it like that? We need to at least retain compatibility with >>> pre-disag qemus. >>> >>> >>> >> Indeed, as we have multiple QEMUs for a same domain, we need >> to have one control path by QEMU. >> >> Pre-disag QEMUs cannot work with my changes inside the Xen. >> Xen will not forward by default ioreq if there is no ioreq server. >> > We might need to consider making disagg an opt in feature, with the > default being to have as we do today. >When you told feature, it''s only for libxl or even for Xen ? In case of libxl, if ''device_models'' options is not specified we used only one QEMU. So there is compatibility with previous configuration file. In case of Xen, it''s hard to have a compatibility. We can still spawn only one QEMU, but ioreq handling will not send an io request if no device models registered it. There is no more default QEMU.>>>> + * PCI device number. Before 3, we have IDE, ISA, SouthBridge and >>>> + * XEN PCI. Theses devices will be emulate in each QEMU, but only >>>> + * one QEMU (the one which emulates default device) will register >>>> + * these devices through Xen PCI hypercall. >>>> + */ >>>> + static unsigned int bdf = 3; >>>> >>>> >>> Do you mean const rather than static? >>> >>> >>> >> No static. With QEMU disaggregation, the toolstack allocate >> BDF incrementaly. QEMU is unable to know if a BDF is already >> allocated in another QEMU. >> > This is broken if the toolstack is building multiple domains, since the > bdf will be preserved across each of them. > > You need to put this in some sort of data structure specific to this > particular iteration of the builder code. We must surely have something > suitable close to hand in this function. libxl__domain_build_state > perhaps? > >Will be fix in the next patch version.> A static variable in a library is almost always a mistake. > > >>> Isn''t this baking in some implementation detail from the current qemu >>> version? What happens if it changes? >>> >>> >> I don''t have another way for the moment. I would be happy, >> if someone have a good solution. >> > Could we at least make the assignments of the 3 prior BDFs explicit on > the command line too? >I don''t understand your question. Theses 3 priors BDFs can''t be modify via QEMU command line (or I don''t know how).>>>> @@ -528,65 +583,69 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, >>>> abort(); >>>> } >>>> >>>> - ram_size = libxl__sizekb_to_mb(b_info->max_memkb - b_info->video_memkb); >>>> + // Allocate ram space of 32Mo per previous device model to store rom >>>> >>>> >>> What is this about? >>> >>> (also that Mo looks a bit odd in among all these mb''s) >>> >>> >>> >> It''s space for ROM allocation, like vga, rtl8139 roms ... >> Each QEMU can load ROM and memory, but the memory >> allocator consider that it''s alone. It starts to allocate >> ROM space from the end of memory RAM. >> >> It''s a solution suggest by Stefano, it''s avoid modification >> in QEMU. As we don''t know the number of ROM and their >> size per QEMU, we chose a space of 32 Mo to be sure, but in >> fine most of time memory is not allocated. >> > "32Mo per previous device model" is the bit which struck me as odd. That > means the first device model uses 32Mo, the second 64Mo, the third 96Mo > etc? >That means: - first QEMU can allocate ROM after ram_size + 0 - second after ram_size + 32 mo - ... It''s a hack to avoid modification in QEMU memory allocator (find_ram_offset exec.c in QEMU).> Aren''t we already modifying qemu quite substantially to implement this > functionality anyway? so why are we trying to avoid it in this one > corner? Especially at the cost of doing something which on the face of > it looks quite strange! > >It''s not possible to made it in QEMU, otherwise QEMU need to be spawn one by one. Indeed, the next QEMU need to know what is the last ''address'' used by the previous QEMU. I made a modification in this way, but it was abandoned. Indeed, it required XenStore.> Isn''t space for the ROMs allocated by SeaBIOS as part of enumerating the > PCI bus anyway? Or is this a different per-ROM allocation? >It''s the rom allocated via pci_add_option_rom in QEMU. QEMU seems to store ROM in memory and then SeaBIOS will copy it, in the right place. -- Julien
Ian Campbell
2012-Aug-24 14:45 UTC
Re: [XEN][RFC PATCH V2 15/17] xl: support spawn/destroy on multiple device model
On Fri, 2012-08-24 at 15:37 +0100, Julien Grall wrote:> In case of Xen, it''s hard to have a compatibility. We can > still spawn only one QEMU, but ioreq handling will not > send an io request if no device models registered it. > There is no more default QEMU.This means we''ve broken existing qemu on a new hypervisor, which now that we have Xen support in upstream qemu is something we need to think about and decide if we are happy with that or not. Perhaps it is sufficient for this to be a compile time thing, i.e. detect if we are building against a disagg capable hypervisor or not. Or maybe it has to be a runtime thing with Xen only turning off the default QEMU when the first io req region is registered, or something like that.> >>> Isn''t this baking in some implementation detail from the current qemu > >>> version? What happens if it changes? > >>> > >>> > >> I don''t have another way for the moment. I would be happy, > >> if someone have a good solution. > >> > > Could we at least make the assignments of the 3 prior BDFs explicit on > > the command line too? > > > I don''t understand your question. Theses 3 priors BDFs can''t > be modify via QEMU command line (or I don''t know how).Could qemu be modified to allow this?> >>>> @@ -528,65 +583,69 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, > >>>> abort(); > >>>> } > >>>> > >>>> - ram_size = libxl__sizekb_to_mb(b_info->max_memkb - b_info->video_memkb); > >>>> + // Allocate ram space of 32Mo per previous device model to store rom > >>>> > >>>> > >>> What is this about? > >>> > >>> (also that Mo looks a bit odd in among all these mb''s) > >>> > >>> > >>> > >> It''s space for ROM allocation, like vga, rtl8139 roms ... > >> Each QEMU can load ROM and memory, but the memory > >> allocator consider that it''s alone. It starts to allocate > >> ROM space from the end of memory RAM. > >> > >> It''s a solution suggest by Stefano, it''s avoid modification > >> in QEMU. As we don''t know the number of ROM and their > >> size per QEMU, we chose a space of 32 Mo to be sure, but in > >> fine most of time memory is not allocated. > >> > > "32Mo per previous device model" is the bit which struck me as odd. That > > means the first device model uses 32Mo, the second 64Mo, the third 96Mo > > etc? > > > That means: > - first QEMU can allocate ROM after ram_size + 0 > - second after ram_size + 32 mo > - ... > > It''s a hack to avoid modification in QEMU memory allocator > (find_ram_offset exec.c in QEMU).Why don''t we enhance the memory allocator instead of adding hacks?> > Aren''t we already modifying qemu quite substantially to implement this > > functionality anyway? so why are we trying to avoid it in this one > > corner? Especially at the cost of doing something which on the face of > > it looks quite strange! > > > > > It''s not possible to made it in QEMU, otherwise QEMU need to > be spawn one by one. Indeed, the next QEMU need to know > what is the last ''address'' used by the previous QEMU.Or each one needs to be told explicitly where to put its ROMs. Encoding a magic 32Mo*N in the interface is just too hacky.> I made a modification in this way, but it was abandoned. Indeed, > it required XenStore. > > > Isn''t space for the ROMs allocated by SeaBIOS as part of enumerating the > > PCI bus anyway? Or is this a different per-ROM allocation? > > > It''s the rom allocated via pci_add_option_rom in QEMU. > QEMU seems to store ROM in memory and then SeaBIOS > will copy it, in the right place.So the ROM binary (the content of the ROM_BAR) is stored in "guest" memory? That seems a bit odd to me, I''d have thought it would be stored in the host and provided on demand when the ROM BAR was accessed. Is there any scope for changing this behaviour? Ian.
>>> On 23.08.12 at 12:52, Julien Grall <julien.grall@citrix.com> wrote: > On 08/23/2012 08:27 AM, Jan Beulich wrote: >>> switch ( a.index ) >>> { >>> - case HVM_PARAM_IOREQ_PFN: >>> >> Removing sub-ops which a domain can issue for itself (which for this and >> another one below appears to be the case) is not allowed. >> > > I removed these 3 sub-ops because it will not work with > QEMU disaggregation. Shared pages and event channel > for IO request are private for each device model.Then they need to be made inaccessible for that specific setup, not removed altogether.>>> + case HVM_PARAM_IO_PFN_FIRST: >>> >> I don''t see where in this patch this and the other new sub-op constants >> get defined. >> > Both sub-op constants are added in patch 1: > http://lists.xen.org/archives/html/xen-devel/2012-08/msg01767.htmlHmm, I can certainly see reasons for breaking up things that way, but I generally prefer patches to represent functional units. Jan
On 08/24/2012 04:38 PM, Jan Beulich wrote:>>>> On 23.08.12 at 12:52, Julien Grall<julien.grall@citrix.com> wrote: >>>> >> On 08/23/2012 08:27 AM, Jan Beulich wrote: >> >>>> switch ( a.index ) >>>> { >>>> - case HVM_PARAM_IOREQ_PFN: >>>> >>>> >>> Removing sub-ops which a domain can issue for itself (which for this and >>> another one below appears to be the case) is not allowed. >>> >>> >> I removed these 3 sub-ops because it will not work with >> QEMU disaggregation. Shared pages and event channel >> for IO request are private for each device model. >> > Then they need to be made inaccessible for that specific setup, not > removed altogether. > >What do you mean by specific feature ? With this patch series, you are able to handle one or more QEMU. Keep a compatibility with the old IO emulation is hard. It''s still possible but IOreq handle will not send an IOreq if no device models registered it. There is no more default QEMU. I have send a long patch series for QEMU, but it''s for supporting a "full disaggregation" (i.e. multiple QEMU for domain). If you want to handle only one QEMU the patch decrease to only 100 lines. So it can be backported easily.>>>> + case HVM_PARAM_IO_PFN_FIRST: >>>> >>>> >>> I don''t see where in this patch this and the other new sub-op constants >>> get defined. >>> >>> >> Both sub-op constants are added in patch 1: >> http://lists.xen.org/archives/html/xen-devel/2012-08/msg01767.html >> > Hmm, I can certainly see reasons for breaking up things that way, > but I generally prefer patches to represent functional units. >I will rework my patch series to represen function units. -- Julien Grall
>>> On 10.09.12 at 15:02, Julien Grall <julien.grall@citrix.com> wrote: > On 08/24/2012 04:38 PM, Jan Beulich wrote: >>>>> On 23.08.12 at 12:52, Julien Grall<julien.grall@citrix.com> wrote: >>>>> >>> On 08/23/2012 08:27 AM, Jan Beulich wrote: >>> >>>>> switch ( a.index ) >>>>> { >>>>> - case HVM_PARAM_IOREQ_PFN: >>>>> >>>>> >>>> Removing sub-ops which a domain can issue for itself (which for this and >>>> another one below appears to be the case) is not allowed. >>>> >>>> >>> I removed these 3 sub-ops because it will not work with >>> QEMU disaggregation. Shared pages and event channel >>> for IO request are private for each device model. >>> >> Then they need to be made inaccessible for that specific setup, not >> removed altogether. >> >> > What do you mean by specific feature ? > With this patch series, you are able to handle one or more > QEMU. > Keep a compatibility with the old IO emulation is hard.Did you read my original reply? Code backing operations that a guest can issue itself (i.e. without qemu or another host side component involved) just can''t be removed, as you/we have no control over which guest(s) may be making use of that functionality. Jan
On 09/10/2012 02:23 PM, Jan Beulich wrote:>>>> On 10.09.12 at 15:02, Julien Grall<julien.grall@citrix.com> wrote: >>>> >> On 08/24/2012 04:38 PM, Jan Beulich wrote: >> >>>>>> On 23.08.12 at 12:52, Julien Grall<julien.grall@citrix.com> wrote: >>>>>> >>>>>> >>>> On 08/23/2012 08:27 AM, Jan Beulich wrote: >>>> >>>> >>>>>> switch ( a.index ) >>>>>> { >>>>>> - case HVM_PARAM_IOREQ_PFN: >>>>>> >>>>>> >>>>>> >>>>> Removing sub-ops which a domain can issue for itself (which for this and >>>>> another one below appears to be the case) is not allowed. >>>>> >>>>> >>>>> >>>> I removed these 3 sub-ops because it will not work with >>>> QEMU disaggregation. Shared pages and event channel >>>> for IO request are private for each device model. >>>> >>>> >>> Then they need to be made inaccessible for that specific setup, not >>> removed altogether. >>> >>> >>> >> What do you mean by specific feature ? >> With this patch series, you are able to handle one or more >> QEMU. >> Keep a compatibility with the old IO emulation is hard. >> > Did you read my original reply? Code backing operations that a > guest can issue itself (i.e. without qemu or another host side > component involved) just can''t be removed, as you/we have > no control over which guest(s) may be making use of that > functionality. >Ah ok misundertanding of my part. I don''t really understand in which case a domain needs to retrieve its ioreq page. How can I made it inaccessible ? Just rc = -EINVAL ? -- Julien Grall
>>> On 10.09.12 at 15:35, Julien Grall <julien.grall@citrix.com> wrote: > On 09/10/2012 02:23 PM, Jan Beulich wrote: >>>>> On 10.09.12 at 15:02, Julien Grall<julien.grall@citrix.com> wrote: >>>>> >>> On 08/24/2012 04:38 PM, Jan Beulich wrote: >>> >>>>>>> On 23.08.12 at 12:52, Julien Grall<julien.grall@citrix.com> wrote: >>>>>>> >>>>>>> >>>>> On 08/23/2012 08:27 AM, Jan Beulich wrote: >>>>> >>>>> >>>>>>> switch ( a.index ) >>>>>>> { >>>>>>> - case HVM_PARAM_IOREQ_PFN: >>>>>>> >>>>>>> >>>>>>> >>>>>> Removing sub-ops which a domain can issue for itself (which for this and >>>>>> another one below appears to be the case) is not allowed. >>>>>> >>>>>> >>>>>> >>>>> I removed these 3 sub-ops because it will not work with >>>>> QEMU disaggregation. Shared pages and event channel >>>>> for IO request are private for each device model. >>>>> >>>>> >>>> Then they need to be made inaccessible for that specific setup, not >>>> removed altogether. >>>> >>>> >>>> >>> What do you mean by specific feature ? >>> With this patch series, you are able to handle one or more >>> QEMU. >>> Keep a compatibility with the old IO emulation is hard. >>> >> Did you read my original reply? Code backing operations that a >> guest can issue itself (i.e. without qemu or another host side >> component involved) just can''t be removed, as you/we have >> no control over which guest(s) may be making use of that >> functionality. >> > > Ah ok misundertanding of my part. I don''t really understand > in which case a domain needs to retrieve its ioreq page. > How can I made it inaccessible ? Just rc = -EINVAL ?Probably, but you''d better talk to whoever added that code (including to determine whether this by mistake was left guest invokable). Jan