Wei Wang
2012-Mar-08 13:21 UTC
[PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems
Hi, This is patch set v6. It includes all pending patches that are needed to enable gpgpu passthrough and heterogeneous computing for guests. thanks, Wei For more details, please refer to old threads. http://lists.xen.org/archives/html/xen-devel/2012-02/msg00889.html http://lists.xen.org/archives/html/xen-devel/2012-01/msg01646.html and, for an overview of the design, please refer to http://www.amd64.org/pub/iommuv2.png =====================================================================changes in v6: * Fix indentation issues. * Fix definition of iommu_set_msi. * Rebase on top of tip. changes in v5: * Remove patch 2 after upstream c/s 24729:6f6a6d1d2fb6 changes in v4: * Only tool part in this version, since hypervisor patches have already been committed. * rename guest config option from "iommu = {0,1}" to "guest_iommu = {0,1}" * add description into docs/man/xl.cfg.pod.5 changes in v3: * Use xenstore to receive guest iommu configuration instead of adding in a new field in hvm_info_table. * Support pci segment in vbdf to mbdf bind. * Make hypercalls visible for non-x86 platforms. * A few code cleanups according to comments from Jan and Ian. Changes in v2: * Do not use linked list to access guest iommu tables. * Do not parse iommu parameter in libxl_device_model_info again. * Fix incorrect logical calculation in patch 11. * Fix hypercall definition for non-x86 systems.
# HG changeset patch # User Wei Wang <wei.wang2@amd.com> # Date 1331210211 -3600 # Node ID 810296995a893c1bec898a366f08520fbae755fb # Parent f611200469159a960c3f0b975aa0f91ec768e29f amd iommu: Add 2 hypercalls for libxc iommu_set_msi: used by qemu to inform hypervisor iommu vector number in guest space. Hypervisor needs this vector to inject msi into guest after writing PPR logs back to guest. iommu_bind_bdf: used by xl to bind virtual bdf to machine bdf for passthru devices. IOMMU emulator receives iommu cmd from guest OS and then forwards them to host iommu. Virtual device ids in guest iommu commands must be converted into physical ids before sending them to real hardware. Signed-off-by: Wei Wang <wei.wang2@amd.com> diff -r f61120046915 -r 810296995a89 xen/drivers/passthrough/amd/iommu_guest.c --- a/xen/drivers/passthrough/amd/iommu_guest.c Wed Mar 07 11:50:31 2012 +0100 +++ b/xen/drivers/passthrough/amd/iommu_guest.c Thu Mar 08 13:36:51 2012 +0100 @@ -48,14 +48,31 @@ (reg)->hi = (val) >> 32; \ } while (0) -static unsigned int machine_bdf(struct domain *d, uint16_t guest_bdf) +static unsigned int machine_bdf(struct domain *d, uint16_t guest_seg, + uint16_t guest_bdf) { - return guest_bdf; + struct pci_dev *pdev; + uint16_t mbdf = 0; + + for_each_pdev( d, pdev ) + { + if ( (pdev->gbdf == guest_bdf) && (pdev->gseg == guest_seg) ) + { + mbdf = PCI_BDF2(pdev->bus, pdev->devfn); + break; + } + } + return mbdf; } -static uint16_t guest_bdf(struct domain *d, uint16_t machine_bdf) +static uint16_t guest_bdf(struct domain *d, uint16_t machine_seg, + uint16_t machine_bdf) { - return machine_bdf; + struct pci_dev *pdev; + + pdev = pci_get_pdev_by_domain(d, machine_seg, PCI_BUS(machine_bdf), + PCI_DEVFN2(machine_bdf)); + return pdev->gbdf; } static inline struct guest_iommu *domain_iommu(struct domain *d) @@ -207,7 +224,7 @@ void guest_iommu_add_ppr_log(struct doma log = log_base + tail % (PAGE_SIZE / sizeof(ppr_entry_t)); /* Convert physical device id back into virtual device id */ - gdev_id = guest_bdf(d, iommu_get_devid_from_cmd(entry[0])); + gdev_id = guest_bdf(d, 0, iommu_get_devid_from_cmd(entry[0])); iommu_set_devid_to_cmd(&entry[0], gdev_id); memcpy(log, entry, sizeof(ppr_entry_t)); @@ -256,7 +273,7 @@ void guest_iommu_add_event_log(struct do log = log_base + tail % (PAGE_SIZE / sizeof(event_entry_t)); /* re-write physical device id into virtual device id */ - dev_id = guest_bdf(d, iommu_get_devid_from_cmd(entry[0])); + dev_id = guest_bdf(d, 0, iommu_get_devid_from_cmd(entry[0])); iommu_set_devid_to_cmd(&entry[0], dev_id); memcpy(log, entry, sizeof(event_entry_t)); @@ -278,7 +295,7 @@ static int do_complete_ppr_request(struc uint16_t dev_id; struct amd_iommu *iommu; - dev_id = machine_bdf(d, iommu_get_devid_from_cmd(cmd->data[0])); + dev_id = machine_bdf(d, 0, iommu_get_devid_from_cmd(cmd->data[0])); iommu = find_iommu_for_device(0, dev_id); if ( !iommu ) @@ -330,7 +347,7 @@ static int do_invalidate_iotlb_pages(str struct amd_iommu *iommu; uint16_t dev_id; - dev_id = machine_bdf(d, iommu_get_devid_from_cmd(cmd->data[0])); + dev_id = machine_bdf(d, 0, iommu_get_devid_from_cmd(cmd->data[0])); iommu = find_iommu_for_device(0, dev_id); if ( !iommu ) @@ -409,7 +426,7 @@ static int do_invalidate_dte(struct doma g_iommu = domain_iommu(d); gbdf = iommu_get_devid_from_cmd(cmd->data[0]); - mbdf = machine_bdf(d, gbdf); + mbdf = machine_bdf(d, 0, gbdf); /* Guest can only update DTEs for its passthru devices */ if ( mbdf == 0 || gbdf == 0 ) @@ -916,3 +933,44 @@ const struct hvm_mmio_handler iommu_mmio .read_handler = guest_iommu_mmio_read, .write_handler = guest_iommu_mmio_write }; + +/* iommu hypercall handler */ +int iommu_bind_bdf(struct domain* d, uint16_t gseg, uint16_t gbdf, + uint16_t mseg, uint16_t mbdf) +{ + struct pci_dev *pdev; + int ret = -ENODEV; + + if ( !iommu_found() || !iommu_enabled || !iommuv2_enabled ) + return 0; + + spin_lock(&pcidevs_lock); + + for_each_pdev( d, pdev ) + { + if ( (pdev->seg != mseg) || (pdev->bus != PCI_BUS(mbdf) ) || + (pdev->devfn != PCI_DEVFN2(mbdf)) ) + continue; + + pdev->gseg = gseg; + pdev->gbdf = gbdf; + ret = 0; + } + + spin_unlock(&pcidevs_lock); + return ret; +} + +void iommu_set_msi(struct domain* d, uint8_t vector, uint8_t dest, + uint8_t dest_mode, uint8_t delivery_mode, uint8_t trig_mode) +{ + struct guest_iommu *iommu = domain_iommu(d); + + if ( !iommu ) + return; + + iommu->msi.vector = vector; + iommu->msi.dest = dest; + iommu->msi.dest_mode = dest_mode; + iommu->msi.trig_mode = trig_mode; +} diff -r f61120046915 -r 810296995a89 xen/drivers/passthrough/iommu.c --- a/xen/drivers/passthrough/iommu.c Wed Mar 07 11:50:31 2012 +0100 +++ b/xen/drivers/passthrough/iommu.c Thu Mar 08 13:36:51 2012 +0100 @@ -653,6 +653,40 @@ int iommu_do_domctl( put_domain(d); break; + case XEN_DOMCTL_guest_iommu_op: + { + xen_domctl_guest_iommu_op_t *guest_op; + + if ( unlikely((d = get_domain_by_id(domctl->domain)) == NULL) ) + { + gdprintk(XENLOG_ERR, + "XEN_DOMCTL_guest_iommu_op: get_domain_by_id() failed\n"); + ret = -EINVAL; + break; + } + + guest_op = &(domctl->u.guest_iommu_op); + switch ( guest_op->op ) + { + case XEN_DOMCTL_GUEST_IOMMU_OP_SET_MSI: + iommu_set_msi(d, guest_op->u.msi.vector, + guest_op->u.msi.dest, + guest_op->u.msi.dest_mode, + guest_op->u.msi.delivery_mode, + guest_op->u.msi.trig_mode); + ret = 0; + break; + case XEN_DOMCTL_GUEST_IOMMU_OP_BIND_BDF: + ret = iommu_bind_bdf(d, guest_op->u.bdf_bind.g_seg, + guest_op->u.bdf_bind.g_bdf, + guest_op->u.bdf_bind.m_seg, + guest_op->u.bdf_bind.m_bdf); + break; + } + put_domain(d); + break; + } + default: ret = -ENOSYS; break; diff -r f61120046915 -r 810296995a89 xen/include/public/domctl.h --- a/xen/include/public/domctl.h Wed Mar 07 11:50:31 2012 +0100 +++ b/xen/include/public/domctl.h Thu Mar 08 13:36:51 2012 +0100 @@ -820,6 +820,31 @@ struct xen_domctl_set_access_required { typedef struct xen_domctl_set_access_required xen_domctl_set_access_required_t; DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_access_required_t); +/* Support for guest iommu emulation */ +struct xen_domctl_guest_iommu_op { + /* XEN_DOMCTL_GUEST_IOMMU_OP_* */ +#define XEN_DOMCTL_GUEST_IOMMU_OP_SET_MSI 0 +#define XEN_DOMCTL_GUEST_IOMMU_OP_BIND_BDF 1 + uint8_t op; + union { + struct iommu_msi { + uint8_t vector; + uint8_t dest; + uint8_t dest_mode; + uint8_t delivery_mode; + uint8_t trig_mode; + } msi; + struct bdf_bind { + uint16_t g_seg; + uint16_t g_bdf; + uint16_t m_seg; + uint16_t m_bdf; + } bdf_bind; + } u; +}; +typedef struct xen_domctl_guest_iommu_op xen_domctl_guest_iommu_op_t; +DEFINE_XEN_GUEST_HANDLE(xen_domctl_guest_iommu_op_t); + struct xen_domctl { uint32_t cmd; #define XEN_DOMCTL_createdomain 1 @@ -885,6 +910,7 @@ struct xen_domctl { #define XEN_DOMCTL_set_access_required 64 #define XEN_DOMCTL_audit_p2m 65 #define XEN_DOMCTL_set_virq_handler 66 +#define XEN_DOMCTL_guest_iommu_op 67 #define XEN_DOMCTL_gdbsx_guestmemio 1000 #define XEN_DOMCTL_gdbsx_pausevcpu 1001 #define XEN_DOMCTL_gdbsx_unpausevcpu 1002 @@ -933,6 +959,7 @@ struct xen_domctl { struct xen_domctl_debug_op debug_op; struct xen_domctl_mem_event_op mem_event_op; struct xen_domctl_mem_sharing_op mem_sharing_op; + struct xen_domctl_guest_iommu_op guest_iommu_op; #if defined(__i386__) || defined(__x86_64__) struct xen_domctl_cpuid cpuid; struct xen_domctl_vcpuextstate vcpuextstate; diff -r f61120046915 -r 810296995a89 xen/include/xen/iommu.h --- a/xen/include/xen/iommu.h Wed Mar 07 11:50:31 2012 +0100 +++ b/xen/include/xen/iommu.h Thu Mar 08 13:36:51 2012 +0100 @@ -164,6 +164,11 @@ int iommu_do_domctl(struct xen_domctl *, void iommu_iotlb_flush(struct domain *d, unsigned long gfn, unsigned int page_count); void iommu_iotlb_flush_all(struct domain *d); +/* Only used by AMD IOMMU so far */ +void iommu_set_msi(struct domain* d, uint8_t vector, uint8_t dest, + uint8_t dest_mode, uint8_t delivery_mode, uint8_t trig_mode); +int iommu_bind_bdf(struct domain* d, uint16_t gseg, uint16_t gbdf, + uint16_t mseg, uint16_t mbdf); /* * The purpose of the iommu_dont_flush_iotlb optional cpu flag is to * avoid unecessary iotlb_flush in the low level IOMMU code. diff -r f61120046915 -r 810296995a89 xen/include/xen/pci.h --- a/xen/include/xen/pci.h Wed Mar 07 11:50:31 2012 +0100 +++ b/xen/include/xen/pci.h Thu Mar 08 13:36:51 2012 +0100 @@ -62,6 +62,11 @@ struct pci_dev { const u16 seg; const u8 bus; const u8 devfn; + + /* Used by iommu to represent virtual seg and bdf value in guest space */ + u16 gseg; + u16 gbdf; + struct pci_dev_info info; struct arch_pci_dev arch; u64 vf_rlen[6];
Wei Wang
2012-Mar-08 13:21 UTC
[PATCH 2 of 6 V6] amd iommu: call guest_iommu_set_base from hvmloader
# HG changeset patch # User Wei Wang <wei.wang2@amd.com> # Date 1331210214 -3600 # Node ID e9d74ec1077472f9127c43903811ce3107fc038d # Parent 810296995a893c1bec898a366f08520fbae755fb amd iommu: call guest_iommu_set_base from hvmloader. IOMMU MMIO base address is dynamically allocated by firmware. This patch allows hvmloader to notify hypervisor where the iommu mmio pages are. Signed-off-by: Wei Wang <wei.wang2@amd.com> diff -r 810296995a89 -r e9d74ec10774 xen/arch/x86/hvm/hvm.c --- a/xen/arch/x86/hvm/hvm.c Thu Mar 08 13:36:51 2012 +0100 +++ b/xen/arch/x86/hvm/hvm.c Thu Mar 08 13:36:54 2012 +0100 @@ -66,6 +66,7 @@ #include <asm/mem_event.h> #include <asm/mem_access.h> #include <public/mem_event.h> +#include <asm/hvm/svm/amd-iommu-proto.h> bool_t __read_mostly hvm_enabled; @@ -3786,6 +3787,9 @@ long do_hvm_op(unsigned long op, XEN_GUE case HVM_PARAM_BUFIOREQ_EVTCHN: rc = -EINVAL; break; + case HVM_PARAM_IOMMU_BASE: + rc = guest_iommu_set_base(d, a.value); + break; } if ( rc == 0 ) diff -r 810296995a89 -r e9d74ec10774 xen/include/public/hvm/params.h --- a/xen/include/public/hvm/params.h Thu Mar 08 13:36:51 2012 +0100 +++ b/xen/include/public/hvm/params.h Thu Mar 08 13:36:54 2012 +0100 @@ -141,7 +141,8 @@ /* Boolean: Enable nestedhvm (hvm only) */ #define HVM_PARAM_NESTEDHVM 24 +#define HVM_PARAM_IOMMU_BASE 27 -#define HVM_NR_PARAMS 27 +#define HVM_NR_PARAMS 28 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */
# HG changeset patch # User Wei Wang <wei.wang2@amd.com> # Date 1331210217 -3600 # Node ID d0611a8ee06d3f34de1c7c51da8571d9e1a668e1 # Parent e9d74ec1077472f9127c43903811ce3107fc038d hvmloader: Build IVRS table. There are 32 ivrs padding entries allocated at the beginning. If a passthru device has been found from qemu bus, a padding entry will be replaced by a real device entry. This patch has been tested with both rombios and seabios Signed-off-by: Wei Wang <wei.wang2@amd.com> diff -r e9d74ec10774 -r d0611a8ee06d tools/firmware/hvmloader/acpi/acpi2_0.h --- a/tools/firmware/hvmloader/acpi/acpi2_0.h Thu Mar 08 13:36:54 2012 +0100 +++ b/tools/firmware/hvmloader/acpi/acpi2_0.h Thu Mar 08 13:36:57 2012 +0100 @@ -389,6 +389,60 @@ struct acpi_20_madt_intsrcovr { #define ACPI_2_0_WAET_REVISION 0x01 #define ACPI_1_0_FADT_REVISION 0x01 +#define IVRS_SIGNATURE ASCII32(''I'',''V'',''R'',''S'') +#define IVRS_REVISION 1 +#define IVRS_VASIZE 64 +#define IVRS_PASIZE 52 +#define IVRS_GVASIZE 64 + +#define IVHD_BLOCK_TYPE 0x10 +#define IVHD_FLAG_HTTUNEN (1 << 0) +#define IVHD_FLAG_PASSPW (1 << 1) +#define IVHD_FLAG_RESPASSPW (1 << 2) +#define IVHD_FLAG_ISOC (1 << 3) +#define IVHD_FLAG_IOTLBSUP (1 << 4) +#define IVHD_FLAG_COHERENT (1 << 5) +#define IVHD_FLAG_PREFSUP (1 << 6) +#define IVHD_FLAG_PPRSUP (1 << 7) + +#define IVHD_EFR_GTSUP (1 << 2) +#define IVHD_EFR_IASUP (1 << 5) + +#define IVHD_SELECT_4_BYTE 0x2 + +struct ivrs_ivhd_block +{ + uint8_t type; + uint8_t flags; + uint16_t length; + uint16_t devid; + uint16_t cap_offset; + uint64_t iommu_base_addr; + uint16_t pci_segment; + uint16_t iommu_info; + uint32_t reserved; +}; + +/* IVHD 4-byte device entries */ +struct ivrs_ivhd_device +{ + uint8_t type; + uint16_t dev_id; + uint8_t flags; +}; + +#define PT_DEV_MAX_NR 32 +#define IOMMU_CAP_OFFSET 0x40 +struct acpi_40_ivrs +{ + struct acpi_header header; + uint32_t iv_info; + uint32_t reserved[2]; + struct ivrs_ivhd_block ivhd_block; + struct ivrs_ivhd_device ivhd_device[PT_DEV_MAX_NR]; +}; + + #pragma pack () struct acpi_config { diff -r e9d74ec10774 -r d0611a8ee06d tools/firmware/hvmloader/acpi/build.c --- a/tools/firmware/hvmloader/acpi/build.c Thu Mar 08 13:36:54 2012 +0100 +++ b/tools/firmware/hvmloader/acpi/build.c Thu Mar 08 13:36:57 2012 +0100 @@ -23,6 +23,8 @@ #include "ssdt_pm.h" #include "../config.h" #include "../util.h" +#include "../hypercall.h" +#include <xen/hvm/params.h> #define align16(sz) (((sz) + 15) & ~15) #define fixed_strcpy(d, s) strncpy((d), (s), sizeof(d)) @@ -198,6 +200,87 @@ static struct acpi_20_waet *construct_wa return waet; } +extern uint32_t ptdev_bdf[PT_DEV_MAX_NR]; +extern uint32_t ptdev_nr; +extern uint32_t iommu_bdf; +static struct acpi_40_ivrs* construct_ivrs(void) +{ + struct acpi_40_ivrs *ivrs; + uint64_t mmio; + struct ivrs_ivhd_block *ivhd; + struct ivrs_ivhd_device *dev_entry; + struct xen_hvm_param p; + + if (ptdev_nr == 0 || iommu_bdf == 0) return NULL; + + ivrs = mem_alloc(sizeof(*ivrs), 16); + if (!ivrs) + { + printf("unable to build IVRS tables: out of memory\n"); + return NULL; + } + memset(ivrs, 0, sizeof(*ivrs)); + + /* initialize acpi header */ + ivrs->header.signature = IVRS_SIGNATURE; + ivrs->header.revision = IVRS_REVISION; + fixed_strcpy(ivrs->header.oem_id, ACPI_OEM_ID); + fixed_strcpy(ivrs->header.oem_table_id, ACPI_OEM_TABLE_ID); + + ivrs->header.oem_revision = ACPI_OEM_REVISION; + ivrs->header.creator_id = ACPI_CREATOR_ID; + ivrs->header.creator_revision = ACPI_CREATOR_REVISION; + + ivrs->header.length = sizeof(*ivrs); + + /* initialize IVHD Block */ + ivhd = &ivrs->ivhd_block; + ivrs->iv_info = (IVRS_VASIZE << 15) | (IVRS_PASIZE << 8) | + (IVRS_GVASIZE << 5); + + ivhd->type = IVHD_BLOCK_TYPE; + ivhd->flags = IVHD_FLAG_PPRSUP | IVHD_FLAG_IOTLBSUP; + ivhd->devid = iommu_bdf; + ivhd->cap_offset = IOMMU_CAP_OFFSET; + + /*reserve 32K IOMMU MMIO space */ + mmio = virt_to_phys(mem_alloc(0x8000, 0x1000)); + if (!mmio) + { + printf("unable to reserve iommu mmio pages: out of memory\n"); + return NULL; + } + + p.domid = DOMID_SELF; + p.index = HVM_PARAM_IOMMU_BASE; + p.value = mmio; + + /* Return non-zero if IOMMUv2 hardware is not avaliable */ + if ( hypercall_hvm_op(HVMOP_set_param, &p) ) + { + printf("unable to set iommu mmio base address\n"); + return NULL; + } + + ivhd->iommu_base_addr = mmio; + ivhd->reserved = IVHD_EFR_IASUP | IVHD_EFR_GTSUP; + + /* Build IVHD device entries */ + dev_entry = ivrs->ivhd_device; + for ( int i = 0; i < ptdev_nr; i++ ) + { + dev_entry[i].type = IVHD_SELECT_4_BYTE; + dev_entry[i].dev_id = ptdev_bdf[i]; + dev_entry[i].flags = 0; + } + + ivhd->length = sizeof(*ivhd) + sizeof(*dev_entry) * PT_DEV_MAX_NR; + set_checksum(ivrs, offsetof(struct acpi_header, checksum), + ivrs->header.length); + + return ivrs; +} + static int construct_secondary_tables(unsigned long *table_ptrs, struct acpi_info *info) { @@ -206,6 +289,7 @@ static int construct_secondary_tables(un struct acpi_20_hpet *hpet; struct acpi_20_waet *waet; struct acpi_20_tcpa *tcpa; + struct acpi_40_ivrs *ivrs; unsigned char *ssdt; static const uint16_t tis_signature[] = {0x0001, 0x0001, 0x0001}; uint16_t *tis_hdr; @@ -293,6 +377,13 @@ static int construct_secondary_tables(un } } + if ( !strncmp(xenstore_read("guest_iommu", "1"), "1", 1) ) + { + ivrs = construct_ivrs(); + if ( ivrs != NULL ) + table_ptrs[nr_tables++] = (unsigned long)ivrs; + } + table_ptrs[nr_tables] = 0; return nr_tables; } diff -r e9d74ec10774 -r d0611a8ee06d tools/firmware/hvmloader/pci.c --- a/tools/firmware/hvmloader/pci.c Thu Mar 08 13:36:54 2012 +0100 +++ b/tools/firmware/hvmloader/pci.c Thu Mar 08 13:36:57 2012 +0100 @@ -34,11 +34,17 @@ unsigned long pci_mem_end = PCI_MEM_END; enum virtual_vga virtual_vga = VGA_none; unsigned long igd_opregion_pgbase = 0; +/* support up to 32 passthrough devices */ +#define PT_DEV_MAX_NR 32 +uint32_t ptdev_bdf[PT_DEV_MAX_NR]; +uint32_t ptdev_nr; +uint32_t iommu_bdf = 0; + void pci_setup(void) { uint32_t base, devfn, bar_reg, bar_data, bar_sz, cmd, mmio_total = 0; uint32_t vga_devfn = 256; - uint16_t class, vendor_id, device_id; + uint16_t class, vendor_id, device_id, sub_vendor_id; unsigned int bar, pin, link, isa_irq; /* Resources assignable to PCI devices via BARs. */ @@ -72,12 +78,34 @@ void pci_setup(void) class = pci_readw(devfn, PCI_CLASS_DEVICE); vendor_id = pci_readw(devfn, PCI_VENDOR_ID); device_id = pci_readw(devfn, PCI_DEVICE_ID); + sub_vendor_id = pci_readw(devfn, PCI_SUBSYSTEM_VENDOR_ID); + if ( (vendor_id == 0xffff) && (device_id == 0xffff) ) continue; ASSERT((devfn != PCI_ISA_DEVFN) || ((vendor_id == 0x8086) && (device_id == 0x7000))); + /* Found amd iommu device. */ + if ( class == 0x0806 && vendor_id == 0x1022 ) + { + iommu_bdf = devfn; + continue; + } + /* IVRS: Detecting passthrough devices. + * sub_vendor_id != citrix && sub_vendor_id != qemu */ + if ( sub_vendor_id != 0x5853 && sub_vendor_id != 0x1af4 ) + { + /* found a passthru device */ + if ( ptdev_nr < PT_DEV_MAX_NR ) + { + ptdev_bdf[ptdev_nr] = devfn; + ptdev_nr++; + } + else + printf("Number of passthru devices > PT_DEV_MAX_NR \n"); + } + switch ( class ) { case 0x0300:
# HG changeset patch # User Wei Wang <wei.wang2@amd.com> # Date 1331210219 -3600 # Node ID d3c4ac0b7cc771ce9639cf3f9f14c0e10d85784d # Parent d0611a8ee06d3f34de1c7c51da8571d9e1a668e1 libxc: add wrappers for new hypercalls Please see patch 1 for hypercall description. Signed-off-by: Wei Wang <wei.wang2@amd.com> Acked-by: Ian Jackson<ian.jackson@eu.citrix.com> diff -r d0611a8ee06d -r d3c4ac0b7cc7 tools/libxc/xc_domain.c --- a/tools/libxc/xc_domain.c Thu Mar 08 13:36:57 2012 +0100 +++ b/tools/libxc/xc_domain.c Thu Mar 08 13:36:59 2012 +0100 @@ -1352,6 +1352,59 @@ int xc_domain_bind_pt_isa_irq( PT_IRQ_TYPE_ISA, 0, 0, 0, machine_irq)); } +int xc_domain_update_iommu_msi( + xc_interface *xch, + uint32_t domid, + uint8_t vector, + uint8_t dest, + uint8_t dest_mode, + uint8_t delivery_mode, + uint8_t trig_mode) +{ + int rc; + DECLARE_DOMCTL; + xen_domctl_guest_iommu_op_t * iommu_op; + + domctl.cmd = XEN_DOMCTL_guest_iommu_op; + domctl.domain = (domid_t)domid; + + iommu_op = &(domctl.u.guest_iommu_op); + iommu_op->op = XEN_DOMCTL_GUEST_IOMMU_OP_SET_MSI; + iommu_op->u.msi.vector = vector; + iommu_op->u.msi.dest = dest; + iommu_op->u.msi.dest_mode = dest_mode; + iommu_op->u.msi.delivery_mode = delivery_mode; + iommu_op->u.msi.trig_mode = trig_mode; + + rc = do_domctl(xch, &domctl); + return rc; +} + +int xc_domain_bind_pt_bdf(xc_interface *xch, + uint32_t domid, + uint16_t gseg, + uint16_t gbdf, + uint16_t mseg, + uint16_t mbdf) +{ + int rc; + DECLARE_DOMCTL; + xen_domctl_guest_iommu_op_t * guest_op; + + domctl.cmd = XEN_DOMCTL_guest_iommu_op; + domctl.domain = (domid_t)domid; + + guest_op = &(domctl.u.guest_iommu_op); + guest_op->op = XEN_DOMCTL_GUEST_IOMMU_OP_BIND_BDF; + guest_op->u.bdf_bind.g_seg = gseg; + guest_op->u.bdf_bind.g_bdf = gbdf; + guest_op->u.bdf_bind.m_seg = mseg; + guest_op->u.bdf_bind.m_bdf = mbdf; + + rc = do_domctl(xch, &domctl); + return rc; +} + int xc_domain_memory_mapping( xc_interface *xch, uint32_t domid, diff -r d0611a8ee06d -r d3c4ac0b7cc7 tools/libxc/xenctrl.h --- a/tools/libxc/xenctrl.h Thu Mar 08 13:36:57 2012 +0100 +++ b/tools/libxc/xenctrl.h Thu Mar 08 13:36:59 2012 +0100 @@ -1734,6 +1734,21 @@ int xc_domain_bind_pt_isa_irq(xc_interfa uint32_t domid, uint8_t machine_irq); +int xc_domain_bind_pt_bdf(xc_interface *xch, + uint32_t domid, + uint16_t gseg, + uint16_t gbdf, + uint16_t mseg, + uint16_t mbdf); + +int xc_domain_update_iommu_msi(xc_interface *xch, + uint32_t domid, + uint8_t vector, + uint8_t dest, + uint8_t dest_mode, + uint8_t delivery_mode, + uint8_t trig_mode); + int xc_domain_set_machine_address_size(xc_interface *xch, uint32_t domid, unsigned int width);
Wei Wang
2012-Mar-08 13:21 UTC
[PATCH 5 of 6 V6] libxl: bind virtual bdf to physical bdf after device assignment
# HG changeset patch # User Wei Wang <wei.wang2@amd.com> # Date 1331210222 -3600 # Node ID 0a1de2dea27370d71d2572869d363d9e5833648e # Parent d3c4ac0b7cc771ce9639cf3f9f14c0e10d85784d libxl: bind virtual bdf to physical bdf after device assignment Signed-off-by: Wei Wang <wei.wang2@amd.com> Acked-by: Ian Jackson<ian.jackson@eu.citrix.com> diff -r d3c4ac0b7cc7 -r 0a1de2dea273 tools/libxl/libxl_pci.c --- a/tools/libxl/libxl_pci.c Thu Mar 08 13:36:59 2012 +0100 +++ b/tools/libxl/libxl_pci.c Thu Mar 08 13:37:02 2012 +0100 @@ -720,6 +720,13 @@ out: LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_assign_device failed"); return ERROR_FAIL; } + if (LIBXL__DOMAIN_IS_TYPE(gc, domid, HVM)) { + rc = xc_domain_bind_pt_bdf(ctx->xch, domid, 0, pcidev->vdevfn, pcidev->domain, pcidev_encode_bdf(pcidev)); + if ( rc ) { + LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_bind_pt_bdf failed"); + return ERROR_FAIL; + } + } } if (!starting)
Wei Wang
2012-Mar-08 13:21 UTC
[PATCH 6 of 6 V6] libxl: Introduce a new guest config file parameter
# HG changeset patch # User Wei Wang <wei.wang2@amd.com> # Date 1331210225 -3600 # Node ID 4c16ebeae0adace4493299b2f6d9c74ce8e6f889 # Parent 0a1de2dea27370d71d2572869d363d9e5833648e libxl: Introduce a new guest config file parameter Use guest_iommu = {1,0} to enable or disable guest iommu emulation. Default value is 0. Regression tests have been done to make sure it does not break non-iommuv2 systems. Signed-off-by: Wei Wang <wei.wang2@amd.com> Acked-by: Ian Jackson<ian.jackson@eu.citrix.com> diff -r 0a1de2dea273 -r 4c16ebeae0ad docs/man/xl.cfg.pod.5 --- a/docs/man/xl.cfg.pod.5 Thu Mar 08 13:37:02 2012 +0100 +++ b/docs/man/xl.cfg.pod.5 Thu Mar 08 13:37:05 2012 +0100 @@ -851,6 +851,11 @@ certainly belong in a more appropriate s Enable graphics device PCI passthrough. XXX which device is passed through ? +=item B<guest_iommu=BOOLEAN> + +Enable a virtual iommu device for hvm guest. It should be enabled to +passthrough AMD HD7900 series GPGPU. + =item B<nomigrate=BOOLEAN> Disable migration of this domain. This enables certain other features diff -r 0a1de2dea273 -r 4c16ebeae0ad tools/libxl/libxl_create.c --- a/tools/libxl/libxl_create.c Thu Mar 08 13:37:02 2012 +0100 +++ b/tools/libxl/libxl_create.c Thu Mar 08 13:37:05 2012 +0100 @@ -146,6 +146,7 @@ int libxl__domain_build_info_setdefault( libxl_defbool_setdefault(&b_info->u.hvm.hpet, true); libxl_defbool_setdefault(&b_info->u.hvm.vpt_align, true); libxl_defbool_setdefault(&b_info->u.hvm.nested_hvm, false); + libxl_defbool_setdefault(&b_info->u.hvm.guest_iommu, false); libxl_defbool_setdefault(&b_info->u.hvm.incr_generationid, false); libxl_defbool_setdefault(&b_info->u.hvm.usb, false); libxl_defbool_setdefault(&b_info->u.hvm.xen_platform_pci, true); @@ -237,13 +238,15 @@ int libxl__domain_build(libxl__gc *gc, vments[4] = "start_time"; vments[5] = libxl__sprintf(gc, "%lu.%02d", start_time.tv_sec,(int)start_time.tv_usec/10000); - localents = libxl__calloc(gc, 7, sizeof(char *)); + localents = libxl__calloc(gc, 9, sizeof(char *)); localents[0] = "platform/acpi"; localents[1] = libxl_defbool_val(info->u.hvm.acpi) ? "1" : "0"; localents[2] = "platform/acpi_s3"; localents[3] = libxl_defbool_val(info->u.hvm.acpi_s3) ? "1" : "0"; localents[4] = "platform/acpi_s4"; localents[5] = libxl_defbool_val(info->u.hvm.acpi_s4) ? "1" : "0"; + localents[6] = "guest_iommu"; + localents[7] = libxl_defbool_val(info->u.hvm.guest_iommu) ? "1" : "0"; break; case LIBXL_DOMAIN_TYPE_PV: diff -r 0a1de2dea273 -r 4c16ebeae0ad tools/libxl/libxl_types.idl --- a/tools/libxl/libxl_types.idl Thu Mar 08 13:37:02 2012 +0100 +++ b/tools/libxl/libxl_types.idl Thu Mar 08 13:37:05 2012 +0100 @@ -268,6 +268,7 @@ libxl_domain_build_info = Struct("domain ("vpt_align", libxl_defbool), ("timer_mode", libxl_timer_mode), ("nested_hvm", libxl_defbool), + ("guest_iommu", libxl_defbool), ("incr_generationid",libxl_defbool), ("nographic", libxl_defbool), ("stdvga", libxl_defbool), diff -r 0a1de2dea273 -r 4c16ebeae0ad tools/libxl/xl_cmdimpl.c --- a/tools/libxl/xl_cmdimpl.c Thu Mar 08 13:37:02 2012 +0100 +++ b/tools/libxl/xl_cmdimpl.c Thu Mar 08 13:37:05 2012 +0100 @@ -749,6 +749,7 @@ static void parse_config_data(const char } xlu_cfg_get_defbool(config, "nestedhvm", &b_info->u.hvm.nested_hvm, 0); + xlu_cfg_get_defbool(config, "guest_iommu", &b_info->u.hvm.guest_iommu, 0); break; case LIBXL_DOMAIN_TYPE_PV: {
Jan Beulich
2012-Mar-08 13:35 UTC
Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc
>>> On 08.03.12 at 14:21, Wei Wang <wei.wang2@amd.com> wrote: > # HG changeset patch > # User Wei Wang <wei.wang2@amd.com> > # Date 1331210211 -3600 > # Node ID 810296995a893c1bec898a366f08520fbae755fb > # Parent f611200469159a960c3f0b975aa0f91ec768e29f > amd iommu: Add 2 hypercalls for libxc > > iommu_set_msi: used by qemu to inform hypervisor iommu vector number in > guest space. Hypervisor needs this vector to inject msi into guest after > writing PPR logs back to guest. > > iommu_bind_bdf: used by xl to bind virtual bdf to machine bdf for passthru > devices. IOMMU emulator receives iommu cmd from guest OS and then forwards > them to host iommu. Virtual device ids in guest iommu commands must be > converted into physical ids before sending them to real hardware.So patch 5 uses the latter one, but the former one is still dead code? Jan
Wei Wang
2012-Mar-08 14:18 UTC
Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc
On 03/08/2012 02:35 PM, Jan Beulich wrote:>>>> On 08.03.12 at 14:21, Wei Wang<wei.wang2@amd.com> wrote: >> # HG changeset patch >> # User Wei Wang<wei.wang2@amd.com> >> # Date 1331210211 -3600 >> # Node ID 810296995a893c1bec898a366f08520fbae755fb >> # Parent f611200469159a960c3f0b975aa0f91ec768e29f >> amd iommu: Add 2 hypercalls for libxc >> >> iommu_set_msi: used by qemu to inform hypervisor iommu vector number in >> guest space. Hypervisor needs this vector to inject msi into guest after >> writing PPR logs back to guest. >> >> iommu_bind_bdf: used by xl to bind virtual bdf to machine bdf for passthru >> devices. IOMMU emulator receives iommu cmd from guest OS and then forwards >> them to host iommu. Virtual device ids in guest iommu commands must be >> converted into physical ids before sending them to real hardware. > > So patch 5 uses the latter one, but the former one is still dead code? > > Jan > >Hi Jan, The former one will be used by qemu. But Ian said we would not add new features into qemu traditional,,So I am waiting for pci passthru patches for upstream qemu being accepted and then I coudl send out my qemu patch. I also had a prove-of-concept patch that was based on the old qemu. I could rebase and resend it if necessary. http://lists.xen.org/archives/html/xen-devel/2011-12/msg01269.html thanks, Wei
I think this IVRS table should be vendor-specific, and we should have the mechanism make it only work for AMD IOMMU. This is because Intel also has the similar support in next generation VT-d, DMAR table should be built also when enable virtual VT-d for the guest. I suggest this table should be only built when guest running on AMD''s platforms. Thanks! Xiantao> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Wei Wang > Sent: Thursday, March 08, 2012 9:22 PM > To: Ian.Jackson@eu.citrix.com; Ian.Campbell@citrix.com; JBeulich@suse.com; > keir@xen.org > Cc: xen-devel@lists.xensource.com > Subject: [Xen-devel] [PATCH 3 of 6 V6] hvmloader: Build IVRS table > > # HG changeset patch > # User Wei Wang <wei.wang2@amd.com> > # Date 1331210217 -3600 > # Node ID d0611a8ee06d3f34de1c7c51da8571d9e1a668e1 > # Parent e9d74ec1077472f9127c43903811ce3107fc038d > hvmloader: Build IVRS table. > > There are 32 ivrs padding entries allocated at the beginning. If a passthru > device has been found from qemu bus, a padding entry will be replaced by a > real device entry. This patch has been tested with both rombios and seabios > > Signed-off-by: Wei Wang <wei.wang2@amd.com> > > diff -r e9d74ec10774 -r d0611a8ee06d > tools/firmware/hvmloader/acpi/acpi2_0.h > --- a/tools/firmware/hvmloader/acpi/acpi2_0.h Thu Mar 08 13:36:54 > 2012 +0100 > +++ b/tools/firmware/hvmloader/acpi/acpi2_0.h Thu Mar 08 13:36:57 > 2012 +0100 > @@ -389,6 +389,60 @@ struct acpi_20_madt_intsrcovr { #define > ACPI_2_0_WAET_REVISION 0x01 #define ACPI_1_0_FADT_REVISION 0x01 > > +#define IVRS_SIGNATURE ASCII32(''I'',''V'',''R'',''S'') > +#define IVRS_REVISION 1 > +#define IVRS_VASIZE 64 > +#define IVRS_PASIZE 52 > +#define IVRS_GVASIZE 64 > + > +#define IVHD_BLOCK_TYPE 0x10 > +#define IVHD_FLAG_HTTUNEN (1 << 0) > +#define IVHD_FLAG_PASSPW (1 << 1) > +#define IVHD_FLAG_RESPASSPW (1 << 2) > +#define IVHD_FLAG_ISOC (1 << 3) > +#define IVHD_FLAG_IOTLBSUP (1 << 4) > +#define IVHD_FLAG_COHERENT (1 << 5) > +#define IVHD_FLAG_PREFSUP (1 << 6) > +#define IVHD_FLAG_PPRSUP (1 << 7) > + > +#define IVHD_EFR_GTSUP (1 << 2) > +#define IVHD_EFR_IASUP (1 << 5) > + > +#define IVHD_SELECT_4_BYTE 0x2 > + > +struct ivrs_ivhd_block > +{ > + uint8_t type; > + uint8_t flags; > + uint16_t length; > + uint16_t devid; > + uint16_t cap_offset; > + uint64_t iommu_base_addr; > + uint16_t pci_segment; > + uint16_t iommu_info; > + uint32_t reserved; > +}; > + > +/* IVHD 4-byte device entries */ > +struct ivrs_ivhd_device > +{ > + uint8_t type; > + uint16_t dev_id; > + uint8_t flags; > +}; > + > +#define PT_DEV_MAX_NR 32 > +#define IOMMU_CAP_OFFSET 0x40 > +struct acpi_40_ivrs > +{ > + struct acpi_header header; > + uint32_t iv_info; > + uint32_t reserved[2]; > + struct ivrs_ivhd_block ivhd_block; > + struct ivrs_ivhd_device ivhd_device[PT_DEV_MAX_NR]; > +}; > + > + > #pragma pack () > > struct acpi_config { > diff -r e9d74ec10774 -r d0611a8ee06d tools/firmware/hvmloader/acpi/build.c > --- a/tools/firmware/hvmloader/acpi/build.c Thu Mar 08 13:36:54 2012 > +0100 > +++ b/tools/firmware/hvmloader/acpi/build.c Thu Mar 08 13:36:57 2012 > +0100 > @@ -23,6 +23,8 @@ > #include "ssdt_pm.h" > #include "../config.h" > #include "../util.h" > +#include "../hypercall.h" > +#include <xen/hvm/params.h> > > #define align16(sz) (((sz) + 15) & ~15) > #define fixed_strcpy(d, s) strncpy((d), (s), sizeof(d)) @@ -198,6 +200,87 @@ > static struct acpi_20_waet *construct_wa > return waet; > } > > +extern uint32_t ptdev_bdf[PT_DEV_MAX_NR]; extern uint32_t ptdev_nr; > +extern uint32_t iommu_bdf; static struct acpi_40_ivrs* > +construct_ivrs(void) { > + struct acpi_40_ivrs *ivrs; > + uint64_t mmio; > + struct ivrs_ivhd_block *ivhd; > + struct ivrs_ivhd_device *dev_entry; > + struct xen_hvm_param p; > + > + if (ptdev_nr == 0 || iommu_bdf == 0) return NULL; > + > + ivrs = mem_alloc(sizeof(*ivrs), 16); > + if (!ivrs) > + { > + printf("unable to build IVRS tables: out of memory\n"); > + return NULL; > + } > + memset(ivrs, 0, sizeof(*ivrs)); > + > + /* initialize acpi header */ > + ivrs->header.signature = IVRS_SIGNATURE; > + ivrs->header.revision = IVRS_REVISION; > + fixed_strcpy(ivrs->header.oem_id, ACPI_OEM_ID); > + fixed_strcpy(ivrs->header.oem_table_id, ACPI_OEM_TABLE_ID); > + > + ivrs->header.oem_revision = ACPI_OEM_REVISION; > + ivrs->header.creator_id = ACPI_CREATOR_ID; > + ivrs->header.creator_revision = ACPI_CREATOR_REVISION; > + > + ivrs->header.length = sizeof(*ivrs); > + > + /* initialize IVHD Block */ > + ivhd = &ivrs->ivhd_block; > + ivrs->iv_info = (IVRS_VASIZE << 15) | (IVRS_PASIZE << 8) | > + (IVRS_GVASIZE << 5); > + > + ivhd->type = IVHD_BLOCK_TYPE; > + ivhd->flags = IVHD_FLAG_PPRSUP | IVHD_FLAG_IOTLBSUP; > + ivhd->devid = iommu_bdf; > + ivhd->cap_offset = IOMMU_CAP_OFFSET; > + > + /*reserve 32K IOMMU MMIO space */ > + mmio = virt_to_phys(mem_alloc(0x8000, 0x1000)); > + if (!mmio) > + { > + printf("unable to reserve iommu mmio pages: out of memory\n"); > + return NULL; > + } > + > + p.domid = DOMID_SELF; > + p.index = HVM_PARAM_IOMMU_BASE; > + p.value = mmio; > + > + /* Return non-zero if IOMMUv2 hardware is not avaliable */ > + if ( hypercall_hvm_op(HVMOP_set_param, &p) ) > + { > + printf("unable to set iommu mmio base address\n"); > + return NULL; > + } > + > + ivhd->iommu_base_addr = mmio; > + ivhd->reserved = IVHD_EFR_IASUP | IVHD_EFR_GTSUP; > + > + /* Build IVHD device entries */ > + dev_entry = ivrs->ivhd_device; > + for ( int i = 0; i < ptdev_nr; i++ ) > + { > + dev_entry[i].type = IVHD_SELECT_4_BYTE; > + dev_entry[i].dev_id = ptdev_bdf[i]; > + dev_entry[i].flags = 0; > + } > + > + ivhd->length = sizeof(*ivhd) + sizeof(*dev_entry) * PT_DEV_MAX_NR; > + set_checksum(ivrs, offsetof(struct acpi_header, checksum), > + ivrs->header.length); > + > + return ivrs; > +} > + > static int construct_secondary_tables(unsigned long *table_ptrs, > struct acpi_info *info) { @@ -206,6 +289,7 @@ static int > construct_secondary_tables(un > struct acpi_20_hpet *hpet; > struct acpi_20_waet *waet; > struct acpi_20_tcpa *tcpa; > + struct acpi_40_ivrs *ivrs; > unsigned char *ssdt; > static const uint16_t tis_signature[] = {0x0001, 0x0001, 0x0001}; > uint16_t *tis_hdr; > @@ -293,6 +377,13 @@ static int construct_secondary_tables(un > } > } > > + if ( !strncmp(xenstore_read("guest_iommu", "1"), "1", 1) ) > + { > + ivrs = construct_ivrs(); > + if ( ivrs != NULL ) > + table_ptrs[nr_tables++] = (unsigned long)ivrs; > + } > + > table_ptrs[nr_tables] = 0; > return nr_tables; > } > diff -r e9d74ec10774 -r d0611a8ee06d tools/firmware/hvmloader/pci.c > --- a/tools/firmware/hvmloader/pci.c Thu Mar 08 13:36:54 2012 +0100 > +++ b/tools/firmware/hvmloader/pci.c Thu Mar 08 13:36:57 2012 +0100 > @@ -34,11 +34,17 @@ unsigned long pci_mem_end = PCI_MEM_END; > enum virtual_vga virtual_vga = VGA_none; unsigned long > igd_opregion_pgbase = 0; > > +/* support up to 32 passthrough devices */ > +#define PT_DEV_MAX_NR 32 > +uint32_t ptdev_bdf[PT_DEV_MAX_NR]; > +uint32_t ptdev_nr; > +uint32_t iommu_bdf = 0; > + > void pci_setup(void) > { > uint32_t base, devfn, bar_reg, bar_data, bar_sz, cmd, mmio_total = 0; > uint32_t vga_devfn = 256; > - uint16_t class, vendor_id, device_id; > + uint16_t class, vendor_id, device_id, sub_vendor_id; > unsigned int bar, pin, link, isa_irq; > > /* Resources assignable to PCI devices via BARs. */ @@ -72,12 +78,34 @@ > void pci_setup(void) > class = pci_readw(devfn, PCI_CLASS_DEVICE); > vendor_id = pci_readw(devfn, PCI_VENDOR_ID); > device_id = pci_readw(devfn, PCI_DEVICE_ID); > + sub_vendor_id = pci_readw(devfn, PCI_SUBSYSTEM_VENDOR_ID); > + > if ( (vendor_id == 0xffff) && (device_id == 0xffff) ) > continue; > > ASSERT((devfn != PCI_ISA_DEVFN) || > ((vendor_id == 0x8086) && (device_id == 0x7000))); > > + /* Found amd iommu device. */ > + if ( class == 0x0806 && vendor_id == 0x1022 ) > + { > + iommu_bdf = devfn; > + continue; > + } > + /* IVRS: Detecting passthrough devices. > + * sub_vendor_id != citrix && sub_vendor_id != qemu */ > + if ( sub_vendor_id != 0x5853 && sub_vendor_id != 0x1af4 ) > + { > + /* found a passthru device */ > + if ( ptdev_nr < PT_DEV_MAX_NR ) > + { > + ptdev_bdf[ptdev_nr] = devfn; > + ptdev_nr++; > + } > + else > + printf("Number of passthru devices > PT_DEV_MAX_NR \n"); > + } > + > switch ( class ) > { > case 0x0300: > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jan Beulich
2012-Mar-08 14:41 UTC
Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc
>>> On 08.03.12 at 15:18, Wei Wang <wei.wang2@amd.com> wrote: > On 03/08/2012 02:35 PM, Jan Beulich wrote: >>>>> On 08.03.12 at 14:21, Wei Wang<wei.wang2@amd.com> wrote: >>> # HG changeset patch >>> # User Wei Wang<wei.wang2@amd.com> >>> # Date 1331210211 -3600 >>> # Node ID 810296995a893c1bec898a366f08520fbae755fb >>> # Parent f611200469159a960c3f0b975aa0f91ec768e29f >>> amd iommu: Add 2 hypercalls for libxc >>> >>> iommu_set_msi: used by qemu to inform hypervisor iommu vector number in >>> guest space. Hypervisor needs this vector to inject msi into guest after >>> writing PPR logs back to guest. >>> >>> iommu_bind_bdf: used by xl to bind virtual bdf to machine bdf for passthru >>> devices. IOMMU emulator receives iommu cmd from guest OS and then forwards >>> them to host iommu. Virtual device ids in guest iommu commands must be >>> converted into physical ids before sending them to real hardware. >> >> So patch 5 uses the latter one, but the former one is still dead code? > > The former one will be used by qemu. But Ian said we would not add new > features into qemu traditional,,So I am waiting for pci passthru patches > for upstream qemu being accepted and then I coudl send out my qemu > patch. I also had a prove-of-concept patch that was based on the old > qemu. I could rebase and resend it if necessary.That sounds fine to me then. Jan
On 03/08/2012 03:22 PM, Zhang, Xiantao wrote:> I think this IVRS table should be vendor-specific, and we should have the mechanism make it only work for AMD IOMMU. This is because Intel also has the similar support in next generation VT-d, DMAR table should be built also when enable virtual VT-d for the guest. I suggest this table should be only built when guest running on AMD''s platforms. > Thanks! > XiantaoHi Xianto Thanks for reviewing it. Actually construct_ivrs() will invoke a hypercall guest_iommu_set_base() and it will fail on non-iommuv2 systems including vtd. So IVRS can be avoided on Intel systems. But I am also thinking that maybe we should let user chose different iommu hw at the beginning. For example, use guest_iommu={vtd, amd, 0} to distinguish different iommu hardware? Thanks, Wei>> -----Original Message----- >> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- >> bounces@lists.xen.org] On Behalf Of Wei Wang >> Sent: Thursday, March 08, 2012 9:22 PM >> To: Ian.Jackson@eu.citrix.com; Ian.Campbell@citrix.com; JBeulich@suse.com; >> keir@xen.org >> Cc: xen-devel@lists.xensource.com >> Subject: [Xen-devel] [PATCH 3 of 6 V6] hvmloader: Build IVRS table >> >> # HG changeset patch >> # User Wei Wang<wei.wang2@amd.com> >> # Date 1331210217 -3600 >> # Node ID d0611a8ee06d3f34de1c7c51da8571d9e1a668e1 >> # Parent e9d74ec1077472f9127c43903811ce3107fc038d >> hvmloader: Build IVRS table. >> >> There are 32 ivrs padding entries allocated at the beginning. If a passthru >> device has been found from qemu bus, a padding entry will be replaced by a >> real device entry. This patch has been tested with both rombios and seabios >> >> Signed-off-by: Wei Wang<wei.wang2@amd.com> >> >> diff -r e9d74ec10774 -r d0611a8ee06d >> tools/firmware/hvmloader/acpi/acpi2_0.h >> --- a/tools/firmware/hvmloader/acpi/acpi2_0.h Thu Mar 08 13:36:54 >> 2012 +0100 >> +++ b/tools/firmware/hvmloader/acpi/acpi2_0.h Thu Mar 08 13:36:57 >> 2012 +0100 >> @@ -389,6 +389,60 @@ struct acpi_20_madt_intsrcovr { #define >> ACPI_2_0_WAET_REVISION 0x01 #define ACPI_1_0_FADT_REVISION 0x01 >> >> +#define IVRS_SIGNATURE ASCII32(''I'',''V'',''R'',''S'') >> +#define IVRS_REVISION 1 >> +#define IVRS_VASIZE 64 >> +#define IVRS_PASIZE 52 >> +#define IVRS_GVASIZE 64 >> + >> +#define IVHD_BLOCK_TYPE 0x10 >> +#define IVHD_FLAG_HTTUNEN (1<< 0) >> +#define IVHD_FLAG_PASSPW (1<< 1) >> +#define IVHD_FLAG_RESPASSPW (1<< 2) >> +#define IVHD_FLAG_ISOC (1<< 3) >> +#define IVHD_FLAG_IOTLBSUP (1<< 4) >> +#define IVHD_FLAG_COHERENT (1<< 5) >> +#define IVHD_FLAG_PREFSUP (1<< 6) >> +#define IVHD_FLAG_PPRSUP (1<< 7) >> + >> +#define IVHD_EFR_GTSUP (1<< 2) >> +#define IVHD_EFR_IASUP (1<< 5) >> + >> +#define IVHD_SELECT_4_BYTE 0x2 >> + >> +struct ivrs_ivhd_block >> +{ >> + uint8_t type; >> + uint8_t flags; >> + uint16_t length; >> + uint16_t devid; >> + uint16_t cap_offset; >> + uint64_t iommu_base_addr; >> + uint16_t pci_segment; >> + uint16_t iommu_info; >> + uint32_t reserved; >> +}; >> + >> +/* IVHD 4-byte device entries */ >> +struct ivrs_ivhd_device >> +{ >> + uint8_t type; >> + uint16_t dev_id; >> + uint8_t flags; >> +}; >> + >> +#define PT_DEV_MAX_NR 32 >> +#define IOMMU_CAP_OFFSET 0x40 >> +struct acpi_40_ivrs >> +{ >> + struct acpi_header header; >> + uint32_t iv_info; >> + uint32_t reserved[2]; >> + struct ivrs_ivhd_block ivhd_block; >> + struct ivrs_ivhd_device ivhd_device[PT_DEV_MAX_NR]; >> +}; >> + >> + >> #pragma pack () >> >> struct acpi_config { >> diff -r e9d74ec10774 -r d0611a8ee06d tools/firmware/hvmloader/acpi/build.c >> --- a/tools/firmware/hvmloader/acpi/build.c Thu Mar 08 13:36:54 2012 >> +0100 >> +++ b/tools/firmware/hvmloader/acpi/build.c Thu Mar 08 13:36:57 2012 >> +0100 >> @@ -23,6 +23,8 @@ >> #include "ssdt_pm.h" >> #include "../config.h" >> #include "../util.h" >> +#include "../hypercall.h" >> +#include<xen/hvm/params.h> >> >> #define align16(sz) (((sz) + 15)& ~15) >> #define fixed_strcpy(d, s) strncpy((d), (s), sizeof(d)) @@ -198,6 +200,87 @@ >> static struct acpi_20_waet *construct_wa >> return waet; >> } >> >> +extern uint32_t ptdev_bdf[PT_DEV_MAX_NR]; extern uint32_t ptdev_nr; >> +extern uint32_t iommu_bdf; static struct acpi_40_ivrs* >> +construct_ivrs(void) { >> + struct acpi_40_ivrs *ivrs; >> + uint64_t mmio; >> + struct ivrs_ivhd_block *ivhd; >> + struct ivrs_ivhd_device *dev_entry; >> + struct xen_hvm_param p; >> + >> + if (ptdev_nr == 0 || iommu_bdf == 0) return NULL; >> + >> + ivrs = mem_alloc(sizeof(*ivrs), 16); >> + if (!ivrs) >> + { >> + printf("unable to build IVRS tables: out of memory\n"); >> + return NULL; >> + } >> + memset(ivrs, 0, sizeof(*ivrs)); >> + >> + /* initialize acpi header */ >> + ivrs->header.signature = IVRS_SIGNATURE; >> + ivrs->header.revision = IVRS_REVISION; >> + fixed_strcpy(ivrs->header.oem_id, ACPI_OEM_ID); >> + fixed_strcpy(ivrs->header.oem_table_id, ACPI_OEM_TABLE_ID); >> + >> + ivrs->header.oem_revision = ACPI_OEM_REVISION; >> + ivrs->header.creator_id = ACPI_CREATOR_ID; >> + ivrs->header.creator_revision = ACPI_CREATOR_REVISION; >> + >> + ivrs->header.length = sizeof(*ivrs); >> + >> + /* initialize IVHD Block */ >> + ivhd =&ivrs->ivhd_block; >> + ivrs->iv_info = (IVRS_VASIZE<< 15) | (IVRS_PASIZE<< 8) | >> + (IVRS_GVASIZE<< 5); >> + >> + ivhd->type = IVHD_BLOCK_TYPE; >> + ivhd->flags = IVHD_FLAG_PPRSUP | IVHD_FLAG_IOTLBSUP; >> + ivhd->devid = iommu_bdf; >> + ivhd->cap_offset = IOMMU_CAP_OFFSET; >> + >> + /*reserve 32K IOMMU MMIO space */ >> + mmio = virt_to_phys(mem_alloc(0x8000, 0x1000)); >> + if (!mmio) >> + { >> + printf("unable to reserve iommu mmio pages: out of memory\n"); >> + return NULL; >> + } >> + >> + p.domid = DOMID_SELF; >> + p.index = HVM_PARAM_IOMMU_BASE; >> + p.value = mmio; >> + >> + /* Return non-zero if IOMMUv2 hardware is not avaliable */ >> + if ( hypercall_hvm_op(HVMOP_set_param,&p) ) >> + { >> + printf("unable to set iommu mmio base address\n"); >> + return NULL; >> + } >> + >> + ivhd->iommu_base_addr = mmio; >> + ivhd->reserved = IVHD_EFR_IASUP | IVHD_EFR_GTSUP; >> + >> + /* Build IVHD device entries */ >> + dev_entry = ivrs->ivhd_device; >> + for ( int i = 0; i< ptdev_nr; i++ ) >> + { >> + dev_entry[i].type = IVHD_SELECT_4_BYTE; >> + dev_entry[i].dev_id = ptdev_bdf[i]; >> + dev_entry[i].flags = 0; >> + } >> + >> + ivhd->length = sizeof(*ivhd) + sizeof(*dev_entry) * PT_DEV_MAX_NR; >> + set_checksum(ivrs, offsetof(struct acpi_header, checksum), >> + ivrs->header.length); >> + >> + return ivrs; >> +} >> + >> static int construct_secondary_tables(unsigned long *table_ptrs, >> struct acpi_info *info) { @@ -206,6 +289,7 @@ static int >> construct_secondary_tables(un >> struct acpi_20_hpet *hpet; >> struct acpi_20_waet *waet; >> struct acpi_20_tcpa *tcpa; >> + struct acpi_40_ivrs *ivrs; >> unsigned char *ssdt; >> static const uint16_t tis_signature[] = {0x0001, 0x0001, 0x0001}; >> uint16_t *tis_hdr; >> @@ -293,6 +377,13 @@ static int construct_secondary_tables(un >> } >> } >> >> + if ( !strncmp(xenstore_read("guest_iommu", "1"), "1", 1) ) >> + { >> + ivrs = construct_ivrs(); >> + if ( ivrs != NULL ) >> + table_ptrs[nr_tables++] = (unsigned long)ivrs; >> + } >> + >> table_ptrs[nr_tables] = 0; >> return nr_tables; >> } >> diff -r e9d74ec10774 -r d0611a8ee06d tools/firmware/hvmloader/pci.c >> --- a/tools/firmware/hvmloader/pci.c Thu Mar 08 13:36:54 2012 +0100 >> +++ b/tools/firmware/hvmloader/pci.c Thu Mar 08 13:36:57 2012 +0100 >> @@ -34,11 +34,17 @@ unsigned long pci_mem_end = PCI_MEM_END; >> enum virtual_vga virtual_vga = VGA_none; unsigned long >> igd_opregion_pgbase = 0; >> >> +/* support up to 32 passthrough devices */ >> +#define PT_DEV_MAX_NR 32 >> +uint32_t ptdev_bdf[PT_DEV_MAX_NR]; >> +uint32_t ptdev_nr; >> +uint32_t iommu_bdf = 0; >> + >> void pci_setup(void) >> { >> uint32_t base, devfn, bar_reg, bar_data, bar_sz, cmd, mmio_total = 0; >> uint32_t vga_devfn = 256; >> - uint16_t class, vendor_id, device_id; >> + uint16_t class, vendor_id, device_id, sub_vendor_id; >> unsigned int bar, pin, link, isa_irq; >> >> /* Resources assignable to PCI devices via BARs. */ @@ -72,12 +78,34 @@ >> void pci_setup(void) >> class = pci_readw(devfn, PCI_CLASS_DEVICE); >> vendor_id = pci_readw(devfn, PCI_VENDOR_ID); >> device_id = pci_readw(devfn, PCI_DEVICE_ID); >> + sub_vendor_id = pci_readw(devfn, PCI_SUBSYSTEM_VENDOR_ID); >> + >> if ( (vendor_id == 0xffff)&& (device_id == 0xffff) ) >> continue; >> >> ASSERT((devfn != PCI_ISA_DEVFN) || >> ((vendor_id == 0x8086)&& (device_id == 0x7000))); >> >> + /* Found amd iommu device. */ >> + if ( class == 0x0806&& vendor_id == 0x1022 ) >> + { >> + iommu_bdf = devfn; >> + continue; >> + } >> + /* IVRS: Detecting passthrough devices. >> + * sub_vendor_id != citrix&& sub_vendor_id != qemu */ >> + if ( sub_vendor_id != 0x5853&& sub_vendor_id != 0x1af4 ) >> + { >> + /* found a passthru device */ >> + if ( ptdev_nr< PT_DEV_MAX_NR ) >> + { >> + ptdev_bdf[ptdev_nr] = devfn; >> + ptdev_nr++; >> + } >> + else >> + printf("Number of passthru devices> PT_DEV_MAX_NR \n"); >> + } >> + >> switch ( class ) >> { >> case 0x0300: >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >
> -----Original Message----- > From: Wei Wang [mailto:wei.wang2@amd.com] > Sent: Thursday, March 08, 2012 10:50 PM > To: Zhang, Xiantao > Cc: Ian.Jackson@eu.citrix.com; Ian.Campbell@citrix.com; JBeulich@suse.com; > keir@xen.org; xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] [PATCH 3 of 6 V6] hvmloader: Build IVRS table > > On 03/08/2012 03:22 PM, Zhang, Xiantao wrote: > > I think this IVRS table should be vendor-specific, and we should have the > mechanism make it only work for AMD IOMMU. This is because Intel also > has the similar support in next generation VT-d, DMAR table should be built > also when enable virtual VT-d for the guest. I suggest this table should be > only built when guest running on AMD''s platforms. > > Thanks! > > Xiantao > > Hi Xianto > Thanks for reviewing it. Actually construct_ivrs() will invoke a hypercall > guest_iommu_set_base() and it will fail on non-iommuv2 systems including > vtd. So IVRS can be avoided on Intel systems. But I am also thinking that > maybe we should let user chose different iommu hw at the beginning. For > example, use guest_iommu={vtd, amd, 0} to distinguish different iommu > hardware?Hi, Wei The term iommu should be neutral for vendors, and the option guest_iommu={0,1} is also fine to VT-d. This guest''s iommu depends on host''s capability, so we can''t let user choose the options like virtual iommu _v2 on vt-d or virtual vt-d on iommu_v2. Basically, in hvm_loader, we should detect platform''s vendor, and then build the corresponding ACPI tables. For the hypercall, we should review the parameters and let it work for both sides, and it may have different implementations for AMD''s and Intel''s platforms in hypervisor. Thanks! Xiantao> >> -----Original Message----- > >> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > >> bounces@lists.xen.org] On Behalf Of Wei Wang > >> Sent: Thursday, March 08, 2012 9:22 PM > >> To: Ian.Jackson@eu.citrix.com; Ian.Campbell@citrix.com; > >> JBeulich@suse.com; keir@xen.org > >> Cc: xen-devel@lists.xensource.com > >> Subject: [Xen-devel] [PATCH 3 of 6 V6] hvmloader: Build IVRS table > >> > >> # HG changeset patch > >> # User Wei Wang<wei.wang2@amd.com> > >> # Date 1331210217 -3600 > >> # Node ID d0611a8ee06d3f34de1c7c51da8571d9e1a668e1 > >> # Parent e9d74ec1077472f9127c43903811ce3107fc038d > >> hvmloader: Build IVRS table. > >> > >> There are 32 ivrs padding entries allocated at the beginning. If a > >> passthru device has been found from qemu bus, a padding entry will be > >> replaced by a real device entry. This patch has been tested with both > >> rombios and seabios > >> > >> Signed-off-by: Wei Wang<wei.wang2@amd.com> > >> > >> diff -r e9d74ec10774 -r d0611a8ee06d > >> tools/firmware/hvmloader/acpi/acpi2_0.h > >> --- a/tools/firmware/hvmloader/acpi/acpi2_0.h Thu Mar 08 13:36:54 > >> 2012 +0100 > >> +++ b/tools/firmware/hvmloader/acpi/acpi2_0.h Thu Mar 08 13:36:57 > >> 2012 +0100 > >> @@ -389,6 +389,60 @@ struct acpi_20_madt_intsrcovr { #define > >> ACPI_2_0_WAET_REVISION 0x01 #define ACPI_1_0_FADT_REVISION > 0x01 > >> > >> +#define IVRS_SIGNATURE ASCII32(''I'',''V'',''R'',''S'') > >> +#define IVRS_REVISION 1 > >> +#define IVRS_VASIZE 64 > >> +#define IVRS_PASIZE 52 > >> +#define IVRS_GVASIZE 64 > >> + > >> +#define IVHD_BLOCK_TYPE 0x10 > >> +#define IVHD_FLAG_HTTUNEN (1<< 0) > >> +#define IVHD_FLAG_PASSPW (1<< 1) > >> +#define IVHD_FLAG_RESPASSPW (1<< 2) > >> +#define IVHD_FLAG_ISOC (1<< 3) > >> +#define IVHD_FLAG_IOTLBSUP (1<< 4) > >> +#define IVHD_FLAG_COHERENT (1<< 5) > >> +#define IVHD_FLAG_PREFSUP (1<< 6) > >> +#define IVHD_FLAG_PPRSUP (1<< 7) > >> + > >> +#define IVHD_EFR_GTSUP (1<< 2) > >> +#define IVHD_EFR_IASUP (1<< 5) > >> + > >> +#define IVHD_SELECT_4_BYTE 0x2 > >> + > >> +struct ivrs_ivhd_block > >> +{ > >> + uint8_t type; > >> + uint8_t flags; > >> + uint16_t length; > >> + uint16_t devid; > >> + uint16_t cap_offset; > >> + uint64_t iommu_base_addr; > >> + uint16_t pci_segment; > >> + uint16_t iommu_info; > >> + uint32_t reserved; > >> +}; > >> + > >> +/* IVHD 4-byte device entries */ > >> +struct ivrs_ivhd_device > >> +{ > >> + uint8_t type; > >> + uint16_t dev_id; > >> + uint8_t flags; > >> +}; > >> + > >> +#define PT_DEV_MAX_NR 32 > >> +#define IOMMU_CAP_OFFSET 0x40 > >> +struct acpi_40_ivrs > >> +{ > >> + struct acpi_header header; > >> + uint32_t iv_info; > >> + uint32_t reserved[2]; > >> + struct ivrs_ivhd_block ivhd_block; > >> + struct ivrs_ivhd_device ivhd_device[PT_DEV_MAX_NR]; > >> +}; > >> + > >> + > >> #pragma pack () > >> > >> struct acpi_config { > >> diff -r e9d74ec10774 -r d0611a8ee06d > tools/firmware/hvmloader/acpi/build.c > >> --- a/tools/firmware/hvmloader/acpi/build.c Thu Mar 08 13:36:54 > 2012 > >> +0100 > >> +++ b/tools/firmware/hvmloader/acpi/build.c Thu Mar 08 13:36:57 > 2012 > >> +0100 > >> @@ -23,6 +23,8 @@ > >> #include "ssdt_pm.h" > >> #include "../config.h" > >> #include "../util.h" > >> +#include "../hypercall.h" > >> +#include<xen/hvm/params.h> > >> > >> #define align16(sz) (((sz) + 15)& ~15) > >> #define fixed_strcpy(d, s) strncpy((d), (s), sizeof(d)) @@ -198,6 > >> +200,87 @@ static struct acpi_20_waet *construct_wa > >> return waet; > >> } > >> > >> +extern uint32_t ptdev_bdf[PT_DEV_MAX_NR]; extern uint32_t > ptdev_nr; > >> +extern uint32_t iommu_bdf; static struct acpi_40_ivrs* > >> +construct_ivrs(void) { > >> + struct acpi_40_ivrs *ivrs; > >> + uint64_t mmio; > >> + struct ivrs_ivhd_block *ivhd; > >> + struct ivrs_ivhd_device *dev_entry; > >> + struct xen_hvm_param p; > >> + > >> + if (ptdev_nr == 0 || iommu_bdf == 0) return NULL; > >> + > >> + ivrs = mem_alloc(sizeof(*ivrs), 16); > >> + if (!ivrs) > >> + { > >> + printf("unable to build IVRS tables: out of memory\n"); > >> + return NULL; > >> + } > >> + memset(ivrs, 0, sizeof(*ivrs)); > >> + > >> + /* initialize acpi header */ > >> + ivrs->header.signature = IVRS_SIGNATURE; > >> + ivrs->header.revision = IVRS_REVISION; > >> + fixed_strcpy(ivrs->header.oem_id, ACPI_OEM_ID); > >> + fixed_strcpy(ivrs->header.oem_table_id, ACPI_OEM_TABLE_ID); > >> + > >> + ivrs->header.oem_revision = ACPI_OEM_REVISION; > >> + ivrs->header.creator_id = ACPI_CREATOR_ID; > >> + ivrs->header.creator_revision = ACPI_CREATOR_REVISION; > >> + > >> + ivrs->header.length = sizeof(*ivrs); > >> + > >> + /* initialize IVHD Block */ > >> + ivhd =&ivrs->ivhd_block; > >> + ivrs->iv_info = (IVRS_VASIZE<< 15) | (IVRS_PASIZE<< 8) | > >> + (IVRS_GVASIZE<< 5); > >> + > >> + ivhd->type = IVHD_BLOCK_TYPE; > >> + ivhd->flags = IVHD_FLAG_PPRSUP | IVHD_FLAG_IOTLBSUP; > >> + ivhd->devid = iommu_bdf; > >> + ivhd->cap_offset = IOMMU_CAP_OFFSET; > >> + > >> + /*reserve 32K IOMMU MMIO space */ > >> + mmio = virt_to_phys(mem_alloc(0x8000, 0x1000)); > >> + if (!mmio) > >> + { > >> + printf("unable to reserve iommu mmio pages: out of memory\n"); > >> + return NULL; > >> + } > >> + > >> + p.domid = DOMID_SELF; > >> + p.index = HVM_PARAM_IOMMU_BASE; > >> + p.value = mmio; > >> + > >> + /* Return non-zero if IOMMUv2 hardware is not avaliable */ > >> + if ( hypercall_hvm_op(HVMOP_set_param,&p) ) > >> + { > >> + printf("unable to set iommu mmio base address\n"); > >> + return NULL; > >> + } > >> + > >> + ivhd->iommu_base_addr = mmio; > >> + ivhd->reserved = IVHD_EFR_IASUP | IVHD_EFR_GTSUP; > >> + > >> + /* Build IVHD device entries */ > >> + dev_entry = ivrs->ivhd_device; > >> + for ( int i = 0; i< ptdev_nr; i++ ) > >> + { > >> + dev_entry[i].type = IVHD_SELECT_4_BYTE; > >> + dev_entry[i].dev_id = ptdev_bdf[i]; > >> + dev_entry[i].flags = 0; > >> + } > >> + > >> + ivhd->length = sizeof(*ivhd) + sizeof(*dev_entry) * PT_DEV_MAX_NR; > >> + set_checksum(ivrs, offsetof(struct acpi_header, checksum), > >> + ivrs->header.length); > >> + > >> + return ivrs; > >> +} > >> + > >> static int construct_secondary_tables(unsigned long *table_ptrs, > >> struct acpi_info *info) { @@ > >> -206,6 +289,7 @@ static int construct_secondary_tables(un > >> struct acpi_20_hpet *hpet; > >> struct acpi_20_waet *waet; > >> struct acpi_20_tcpa *tcpa; > >> + struct acpi_40_ivrs *ivrs; > >> unsigned char *ssdt; > >> static const uint16_t tis_signature[] = {0x0001, 0x0001, 0x0001}; > >> uint16_t *tis_hdr; > >> @@ -293,6 +377,13 @@ static int construct_secondary_tables(un > >> } > >> } > >> > >> + if ( !strncmp(xenstore_read("guest_iommu", "1"), "1", 1) ) > >> + { > >> + ivrs = construct_ivrs(); > >> + if ( ivrs != NULL ) > >> + table_ptrs[nr_tables++] = (unsigned long)ivrs; > >> + } > >> + > >> table_ptrs[nr_tables] = 0; > >> return nr_tables; > >> } > >> diff -r e9d74ec10774 -r d0611a8ee06d tools/firmware/hvmloader/pci.c > >> --- a/tools/firmware/hvmloader/pci.c Thu Mar 08 13:36:54 2012 > +0100 > >> +++ b/tools/firmware/hvmloader/pci.c Thu Mar 08 13:36:57 2012 > +0100 > >> @@ -34,11 +34,17 @@ unsigned long pci_mem_end = PCI_MEM_END; > enum > >> virtual_vga virtual_vga = VGA_none; unsigned long > >> igd_opregion_pgbase = 0; > >> > >> +/* support up to 32 passthrough devices */ > >> +#define PT_DEV_MAX_NR 32 > >> +uint32_t ptdev_bdf[PT_DEV_MAX_NR]; > >> +uint32_t ptdev_nr; > >> +uint32_t iommu_bdf = 0; > >> + > >> void pci_setup(void) > >> { > >> uint32_t base, devfn, bar_reg, bar_data, bar_sz, cmd, mmio_total = 0; > >> uint32_t vga_devfn = 256; > >> - uint16_t class, vendor_id, device_id; > >> + uint16_t class, vendor_id, device_id, sub_vendor_id; > >> unsigned int bar, pin, link, isa_irq; > >> > >> /* Resources assignable to PCI devices via BARs. */ @@ -72,12 > >> +78,34 @@ void pci_setup(void) > >> class = pci_readw(devfn, PCI_CLASS_DEVICE); > >> vendor_id = pci_readw(devfn, PCI_VENDOR_ID); > >> device_id = pci_readw(devfn, PCI_DEVICE_ID); > >> + sub_vendor_id = pci_readw(devfn, PCI_SUBSYSTEM_VENDOR_ID); > >> + > >> if ( (vendor_id == 0xffff)&& (device_id == 0xffff) ) > >> continue; > >> > >> ASSERT((devfn != PCI_ISA_DEVFN) || > >> ((vendor_id == 0x8086)&& (device_id == 0x7000))); > >> > >> + /* Found amd iommu device. */ > >> + if ( class == 0x0806&& vendor_id == 0x1022 ) > >> + { > >> + iommu_bdf = devfn; > >> + continue; > >> + } > >> + /* IVRS: Detecting passthrough devices. > >> + * sub_vendor_id != citrix&& sub_vendor_id != qemu */ > >> + if ( sub_vendor_id != 0x5853&& sub_vendor_id != 0x1af4 ) > >> + { > >> + /* found a passthru device */ > >> + if ( ptdev_nr< PT_DEV_MAX_NR ) > >> + { > >> + ptdev_bdf[ptdev_nr] = devfn; > >> + ptdev_nr++; > >> + } > >> + else > >> + printf("Number of passthru devices> PT_DEV_MAX_NR \n"); > >> + } > >> + > >> switch ( class ) > >> { > >> case 0x0300: > >> > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xen.org > >> http://lists.xen.org/xen-devel > > >
Ian Jackson
2012-Mar-12 11:25 UTC
Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc
Wei Wang writes ("Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc"):> The former one will be used by qemu. But Ian said we would not add new > features into qemu traditional,,So I am waiting for pci passthru patches > for upstream qemu being accepted and then I coudl send out my qemu > patch. I also had a prove-of-concept patch that was based on the old > qemu. I could rebase and resend it if necessary. > > http://lists.xen.org/archives/html/xen-devel/2011-12/msg01269.htmlHow confident are we that the new hypercall will be needed for whatever shape of passthough support ends up in upstream qemu ? Ian.
Wei Wang
2012-Mar-12 15:43 UTC
Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc
On 03/12/2012 12:25 PM, Ian Jackson wrote:> Wei Wang writes ("Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc"): >> The former one will be used by qemu. But Ian said we would not add new >> features into qemu traditional,,So I am waiting for pci passthru patches >> for upstream qemu being accepted and then I coudl send out my qemu >> patch. I also had a prove-of-concept patch that was based on the old >> qemu. I could rebase and resend it if necessary. >> >> http://lists.xen.org/archives/html/xen-devel/2011-12/msg01269.html > > How confident are we that the new hypercall will be needed for > whatever shape of passthough support ends up in upstream qemu ? > > Ian. >Hi Ian, Well, the hypercall itself is very simple and straightforward. It just allow qemu to notify Xen guest iommu msi configurations. Except this, there is also a need in qemu of registering a virtual iommu device in qemu bus. Since we might not wait upstream qemu for 4.2 release, I would like to suggest that if we could take the qemu patch back to traditional qemu. This will give Xen 4.2 the complete support for iommuv2 no matter how upstream qemu looks like, and will also help Linux distros that will be based on Xen 4.2 to have full iommuv2 support. How do you think about it? Thanks, Wei
Ian Jackson
2012-Mar-14 11:55 UTC
Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc
Wei Wang writes ("Re: [Xen-devel] [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc"):> Well, the hypercall itself is very simple and straightforward. It just > allow qemu to notify Xen guest iommu msi configurations. Except this, > there is also a need in qemu of registering a virtual iommu device in > qemu bus. Since we might not wait upstream qemu for 4.2 release, I would > like to suggest that if we could take the qemu patch back to traditional > qemu. This will give Xen 4.2 the complete support for iommuv2 no matter > how upstream qemu looks like, and will also help Linux distros that will > be based on Xen 4.2 to have full iommuv2 support. How do you think about it?We really are trying to keep qemu-xen-unstable feature frozen. It is a shame that the necessary features are still delayed getting into qemu upstream, which has so far prevented us from switching the default over. Ian. (qemu-devel added to the CC)
Ian Jackson
2012-Mar-14 14:02 UTC
Re: [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems
Wei Wang writes ("[Xen-devel] [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems"):> This is patch set v6. It includes all pending patches that are needed to enable gpgpu passthrough and heterogeneous computing for guests.I was looking at this again and it occurred to me to ask: what is "heterogeneous computing for guests" ? Ian.
Ian Campbell
2012-Mar-14 14:09 UTC
Re: [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems
On Thu, 2012-03-08 at 13:21 +0000, Wei Wang wrote:> Hi, > This is patch set v6. It includes all pending patches that are needed to enable gpgpu passthroughI''m curious, is any of this actually specific to GPGPU passthrough or is it just that GPGPU passthrough happens to be the main (only?) reason you would want to expose a virtualised IOMMU to the guest OS? Are there any other use cases? Ian.
Wei Wang
2012-Mar-14 14:32 UTC
Re: [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems
On 03/14/2012 03:02 PM, Ian Jackson wrote:> Wei Wang writes ("[Xen-devel] [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems"): >> This is patch set v6. It includes all pending patches that are needed to enable gpgpu passthrough and heterogeneous computing for guests. > > I was looking at this again and it occurred to me to ask: what is > "heterogeneous computing for guests" ? > > Ian. >It means to passthrough gpgpu to guest and to run openCL applications in guest OS. Using gpu for GP computing is possible for native OS now, this patch just implements the same thing for guest OS. It might be a nice feature for 4.2, isn''t it? Thanks, Wei
Wei Wang
2012-Mar-14 14:48 UTC
Re: [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems
On 03/14/2012 03:09 PM, Ian Campbell wrote:> On Thu, 2012-03-08 at 13:21 +0000, Wei Wang wrote: >> Hi, >> This is patch set v6. It includes all pending patches that are needed to enable gpgpu passthrough > > I''m curious, is any of this actually specific to GPGPU passthrough or is > it just that GPGPU passthrough happens to be the main (only?) reason you > would want to expose a virtualised IOMMU to the guest OS? Are there any > other use cases? > > Ian. > > >For now, gpgpu is the only use case. But actually any sophisticated ats devices that supports PASID and PRI capabilities will need to work with iommuv2 driver in guest OS, so they will also need a virtual iommu in guest space. Please check ats spec 1.1, those capabilities have been documented and they are not specific to GPGPU. Thanks, Wei