thr3ads.net - Xen devel - [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems [Mar 2012]

If this information is useful, please help other people find it:
Share via:

Wei Wang

2012-Mar-08 13:21 UTC

[PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems

Hi,
This is patch set v6. It includes all pending patches that are needed to enable
gpgpu passthrough and heterogeneous computing for guests.
thanks,
Wei

For more details, please refer to old threads.
http://lists.xen.org/archives/html/xen-devel/2012-02/msg00889.html
http://lists.xen.org/archives/html/xen-devel/2012-01/msg01646.html

and, for an overview of the design, please refer to
http://www.amd64.org/pub/iommuv2.png

=====================================================================changes in
v6:
* Fix indentation issues.
* Fix definition of iommu_set_msi.
* Rebase on top of tip.

changes in v5:
* Remove patch 2 after upstream c/s 24729:6f6a6d1d2fb6

changes in v4:
* Only tool part in this version, since hypervisor patches have already been
committed.
* rename guest config option from "iommu = {0,1}" to "guest_iommu
= {0,1}"
* add description into docs/man/xl.cfg.pod.5


changes in v3:
* Use xenstore to receive guest iommu configuration instead of adding in a new
field in hvm_info_table.
* Support pci segment in vbdf to mbdf bind.
* Make hypercalls visible for non-x86 platforms.
* A few code cleanups according to comments from Jan and Ian.

Changes in v2:
* Do not use linked list to access guest iommu tables.
* Do not parse iommu parameter in libxl_device_model_info again.
* Fix incorrect logical calculation in patch 11.
* Fix hypercall definition for non-x86 systems.

Wei Wang

2012-Mar-08 13:21 UTC

head link

[PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc

# HG changeset patch
# User Wei Wang <wei.wang2@amd.com>
# Date 1331210211 -3600
# Node ID 810296995a893c1bec898a366f08520fbae755fb
# Parent  f611200469159a960c3f0b975aa0f91ec768e29f
amd iommu: Add 2 hypercalls for libxc

iommu_set_msi: used by qemu to inform hypervisor iommu vector number in
guest space. Hypervisor needs this vector to inject msi into guest after
writing PPR logs back to guest.

iommu_bind_bdf: used by xl to bind virtual bdf to machine bdf for passthru
devices. IOMMU emulator receives iommu cmd from guest OS and then forwards
them to host iommu. Virtual device ids in guest iommu commands must be
converted into physical ids before sending them to real hardware.

Signed-off-by: Wei Wang <wei.wang2@amd.com>

diff -r f61120046915 -r 810296995a89 xen/drivers/passthrough/amd/iommu_guest.c
--- a/xen/drivers/passthrough/amd/iommu_guest.c	Wed Mar 07 11:50:31 2012 +0100
+++ b/xen/drivers/passthrough/amd/iommu_guest.c	Thu Mar 08 13:36:51 2012 +0100
@@ -48,14 +48,31 @@
         (reg)->hi = (val) >> 32; \
     } while (0)
 
-static unsigned int machine_bdf(struct domain *d, uint16_t guest_bdf)
+static unsigned int machine_bdf(struct domain *d, uint16_t guest_seg,
+                                uint16_t guest_bdf)
 {
-    return guest_bdf;
+    struct pci_dev *pdev;
+    uint16_t mbdf = 0;
+
+    for_each_pdev( d, pdev )
+    {
+        if ( (pdev->gbdf == guest_bdf) && (pdev->gseg ==
guest_seg) )
+        {
+            mbdf = PCI_BDF2(pdev->bus, pdev->devfn);
+            break;
+        }
+    }
+    return mbdf;
 }
 
-static uint16_t guest_bdf(struct domain *d, uint16_t machine_bdf)
+static uint16_t guest_bdf(struct domain *d, uint16_t machine_seg,
+                          uint16_t machine_bdf)
 {
-    return machine_bdf;
+    struct pci_dev *pdev;
+
+    pdev = pci_get_pdev_by_domain(d, machine_seg, PCI_BUS(machine_bdf),
+                                  PCI_DEVFN2(machine_bdf));
+    return pdev->gbdf;
 }
 
 static inline struct guest_iommu *domain_iommu(struct domain *d)
@@ -207,7 +224,7 @@ void guest_iommu_add_ppr_log(struct doma
     log = log_base + tail % (PAGE_SIZE / sizeof(ppr_entry_t));
 
     /* Convert physical device id back into virtual device id */
-    gdev_id = guest_bdf(d, iommu_get_devid_from_cmd(entry[0]));
+    gdev_id = guest_bdf(d, 0, iommu_get_devid_from_cmd(entry[0]));
     iommu_set_devid_to_cmd(&entry[0], gdev_id);
 
     memcpy(log, entry, sizeof(ppr_entry_t));
@@ -256,7 +273,7 @@ void guest_iommu_add_event_log(struct do
     log = log_base + tail % (PAGE_SIZE / sizeof(event_entry_t));
 
     /* re-write physical device id into virtual device id */
-    dev_id = guest_bdf(d, iommu_get_devid_from_cmd(entry[0]));
+    dev_id = guest_bdf(d, 0, iommu_get_devid_from_cmd(entry[0]));
     iommu_set_devid_to_cmd(&entry[0], dev_id);
     memcpy(log, entry, sizeof(event_entry_t));
 
@@ -278,7 +295,7 @@ static int do_complete_ppr_request(struc
     uint16_t dev_id;
     struct amd_iommu *iommu;
 
-    dev_id = machine_bdf(d, iommu_get_devid_from_cmd(cmd->data[0]));
+    dev_id = machine_bdf(d, 0, iommu_get_devid_from_cmd(cmd->data[0]));
     iommu = find_iommu_for_device(0, dev_id);
 
     if ( !iommu )
@@ -330,7 +347,7 @@ static int do_invalidate_iotlb_pages(str
     struct amd_iommu *iommu;
     uint16_t dev_id;
 
-    dev_id = machine_bdf(d, iommu_get_devid_from_cmd(cmd->data[0]));
+    dev_id = machine_bdf(d, 0, iommu_get_devid_from_cmd(cmd->data[0]));
 
     iommu = find_iommu_for_device(0, dev_id);
     if ( !iommu )
@@ -409,7 +426,7 @@ static int do_invalidate_dte(struct doma
 
     g_iommu = domain_iommu(d);
     gbdf = iommu_get_devid_from_cmd(cmd->data[0]);
-    mbdf = machine_bdf(d, gbdf);
+    mbdf = machine_bdf(d, 0, gbdf);
 
     /* Guest can only update DTEs for its passthru devices */
     if ( mbdf == 0 || gbdf == 0 )
@@ -916,3 +933,44 @@ const struct hvm_mmio_handler iommu_mmio
     .read_handler = guest_iommu_mmio_read,
     .write_handler = guest_iommu_mmio_write
 };
+
+/* iommu hypercall handler */
+int iommu_bind_bdf(struct domain* d, uint16_t gseg, uint16_t gbdf,
+                   uint16_t mseg, uint16_t mbdf)
+{
+    struct pci_dev *pdev;
+    int ret = -ENODEV;
+
+    if ( !iommu_found() || !iommu_enabled || !iommuv2_enabled )
+        return 0;
+
+    spin_lock(&pcidevs_lock);
+
+    for_each_pdev( d, pdev )
+    {
+        if ( (pdev->seg != mseg) || (pdev->bus != PCI_BUS(mbdf) ) || 
+             (pdev->devfn != PCI_DEVFN2(mbdf)) )
+            continue;
+
+        pdev->gseg = gseg;
+        pdev->gbdf = gbdf;
+        ret = 0;
+    }
+
+    spin_unlock(&pcidevs_lock);
+    return ret;
+}
+
+void iommu_set_msi(struct domain* d, uint8_t vector, uint8_t dest,
+                   uint8_t dest_mode, uint8_t delivery_mode, uint8_t trig_mode)
+{
+    struct guest_iommu *iommu = domain_iommu(d);
+
+    if ( !iommu )
+        return;
+
+    iommu->msi.vector = vector;
+    iommu->msi.dest = dest;
+    iommu->msi.dest_mode = dest_mode;
+    iommu->msi.trig_mode = trig_mode;
+}
diff -r f61120046915 -r 810296995a89 xen/drivers/passthrough/iommu.c
--- a/xen/drivers/passthrough/iommu.c	Wed Mar 07 11:50:31 2012 +0100
+++ b/xen/drivers/passthrough/iommu.c	Thu Mar 08 13:36:51 2012 +0100
@@ -653,6 +653,40 @@ int iommu_do_domctl(
         put_domain(d);
         break;
 
+    case XEN_DOMCTL_guest_iommu_op:
+    {
+        xen_domctl_guest_iommu_op_t *guest_op;
+
+        if ( unlikely((d = get_domain_by_id(domctl->domain)) == NULL) )
+        {
+            gdprintk(XENLOG_ERR,
+                    "XEN_DOMCTL_guest_iommu_op: get_domain_by_id()
failed\n");
+            ret = -EINVAL;
+            break;
+        }
+
+        guest_op = &(domctl->u.guest_iommu_op);
+        switch ( guest_op->op )
+        {
+        case XEN_DOMCTL_GUEST_IOMMU_OP_SET_MSI:
+             iommu_set_msi(d, guest_op->u.msi.vector, 
+                           guest_op->u.msi.dest,
+                           guest_op->u.msi.dest_mode, 
+                           guest_op->u.msi.delivery_mode,
+                           guest_op->u.msi.trig_mode);
+             ret = 0;
+             break;
+        case XEN_DOMCTL_GUEST_IOMMU_OP_BIND_BDF:
+             ret = iommu_bind_bdf(d, guest_op->u.bdf_bind.g_seg,
+                                  guest_op->u.bdf_bind.g_bdf,
+                                  guest_op->u.bdf_bind.m_seg,
+                                  guest_op->u.bdf_bind.m_bdf);
+             break;
+        }
+        put_domain(d);
+        break;
+    }
+
     default:
         ret = -ENOSYS;
         break;
diff -r f61120046915 -r 810296995a89 xen/include/public/domctl.h
--- a/xen/include/public/domctl.h	Wed Mar 07 11:50:31 2012 +0100
+++ b/xen/include/public/domctl.h	Thu Mar 08 13:36:51 2012 +0100
@@ -820,6 +820,31 @@ struct xen_domctl_set_access_required {
 typedef struct xen_domctl_set_access_required xen_domctl_set_access_required_t;
 DEFINE_XEN_GUEST_HANDLE(xen_domctl_set_access_required_t);
 
+/* Support for guest iommu emulation */
+struct xen_domctl_guest_iommu_op {
+    /* XEN_DOMCTL_GUEST_IOMMU_OP_* */
+#define XEN_DOMCTL_GUEST_IOMMU_OP_SET_MSI               0
+#define XEN_DOMCTL_GUEST_IOMMU_OP_BIND_BDF              1
+    uint8_t op;
+    union {
+        struct iommu_msi {
+            uint8_t     vector;
+            uint8_t     dest;
+            uint8_t     dest_mode;
+            uint8_t     delivery_mode;
+            uint8_t     trig_mode;
+        } msi;
+        struct bdf_bind {
+            uint16_t    g_seg;
+            uint16_t    g_bdf;
+            uint16_t    m_seg;
+            uint16_t    m_bdf;
+        } bdf_bind;
+    } u;
+};
+typedef struct xen_domctl_guest_iommu_op xen_domctl_guest_iommu_op_t;
+DEFINE_XEN_GUEST_HANDLE(xen_domctl_guest_iommu_op_t);
+
 struct xen_domctl {
     uint32_t cmd;
 #define XEN_DOMCTL_createdomain                   1
@@ -885,6 +910,7 @@ struct xen_domctl {
 #define XEN_DOMCTL_set_access_required           64
 #define XEN_DOMCTL_audit_p2m                     65
 #define XEN_DOMCTL_set_virq_handler              66
+#define XEN_DOMCTL_guest_iommu_op                67
 #define XEN_DOMCTL_gdbsx_guestmemio            1000
 #define XEN_DOMCTL_gdbsx_pausevcpu             1001
 #define XEN_DOMCTL_gdbsx_unpausevcpu           1002
@@ -933,6 +959,7 @@ struct xen_domctl {
         struct xen_domctl_debug_op          debug_op;
         struct xen_domctl_mem_event_op      mem_event_op;
         struct xen_domctl_mem_sharing_op    mem_sharing_op;
+        struct xen_domctl_guest_iommu_op    guest_iommu_op;
 #if defined(__i386__) || defined(__x86_64__)
         struct xen_domctl_cpuid             cpuid;
         struct xen_domctl_vcpuextstate      vcpuextstate;
diff -r f61120046915 -r 810296995a89 xen/include/xen/iommu.h
--- a/xen/include/xen/iommu.h	Wed Mar 07 11:50:31 2012 +0100
+++ b/xen/include/xen/iommu.h	Thu Mar 08 13:36:51 2012 +0100
@@ -164,6 +164,11 @@ int iommu_do_domctl(struct xen_domctl *,
 void iommu_iotlb_flush(struct domain *d, unsigned long gfn, unsigned int
page_count);
 void iommu_iotlb_flush_all(struct domain *d);
 
+/* Only used by AMD IOMMU so far */
+void iommu_set_msi(struct domain* d, uint8_t vector, uint8_t dest,
+                   uint8_t dest_mode, uint8_t delivery_mode, uint8_t
trig_mode);
+int iommu_bind_bdf(struct domain* d, uint16_t gseg, uint16_t gbdf, 
+                   uint16_t mseg, uint16_t mbdf);
 /*
  * The purpose of the iommu_dont_flush_iotlb optional cpu flag is to
  * avoid unecessary iotlb_flush in the low level IOMMU code.
diff -r f61120046915 -r 810296995a89 xen/include/xen/pci.h
--- a/xen/include/xen/pci.h	Wed Mar 07 11:50:31 2012 +0100
+++ b/xen/include/xen/pci.h	Thu Mar 08 13:36:51 2012 +0100
@@ -62,6 +62,11 @@ struct pci_dev {
     const u16 seg;
     const u8 bus;
     const u8 devfn;
+
+    /* Used by iommu to represent virtual seg and bdf value in guest space */
+    u16 gseg;
+    u16 gbdf;
+
     struct pci_dev_info info;
     struct arch_pci_dev arch;
     u64 vf_rlen[6];

Wei Wang

2012-Mar-08 13:21 UTC

head link

[PATCH 2 of 6 V6] amd iommu: call guest_iommu_set_base from hvmloader

# HG changeset patch
# User Wei Wang <wei.wang2@amd.com>
# Date 1331210214 -3600
# Node ID e9d74ec1077472f9127c43903811ce3107fc038d
# Parent  810296995a893c1bec898a366f08520fbae755fb
amd iommu: call guest_iommu_set_base from hvmloader.

IOMMU MMIO base address is dynamically allocated by firmware.
This patch allows hvmloader to notify hypervisor where the
iommu mmio pages are.

Signed-off-by: Wei Wang <wei.wang2@amd.com>

diff -r 810296995a89 -r e9d74ec10774 xen/arch/x86/hvm/hvm.c
--- a/xen/arch/x86/hvm/hvm.c	Thu Mar 08 13:36:51 2012 +0100
+++ b/xen/arch/x86/hvm/hvm.c	Thu Mar 08 13:36:54 2012 +0100
@@ -66,6 +66,7 @@
 #include <asm/mem_event.h>
 #include <asm/mem_access.h>
 #include <public/mem_event.h>
+#include <asm/hvm/svm/amd-iommu-proto.h>
 
 bool_t __read_mostly hvm_enabled;
 
@@ -3786,6 +3787,9 @@ long do_hvm_op(unsigned long op, XEN_GUE
             case HVM_PARAM_BUFIOREQ_EVTCHN:
                 rc = -EINVAL;
                 break;
+            case HVM_PARAM_IOMMU_BASE:
+                rc = guest_iommu_set_base(d, a.value);
+                break;
             }
 
             if ( rc == 0 ) 
diff -r 810296995a89 -r e9d74ec10774 xen/include/public/hvm/params.h
--- a/xen/include/public/hvm/params.h	Thu Mar 08 13:36:51 2012 +0100
+++ b/xen/include/public/hvm/params.h	Thu Mar 08 13:36:54 2012 +0100
@@ -141,7 +141,8 @@
 
 /* Boolean: Enable nestedhvm (hvm only) */
 #define HVM_PARAM_NESTEDHVM    24
+#define HVM_PARAM_IOMMU_BASE   27
 
-#define HVM_NR_PARAMS          27
+#define HVM_NR_PARAMS          28
 
 #endif /* __XEN_PUBLIC_HVM_PARAMS_H__ */

Wei Wang

2012-Mar-08 13:21 UTC

head link

[PATCH 3 of 6 V6] hvmloader: Build IVRS table

# HG changeset patch
# User Wei Wang <wei.wang2@amd.com>
# Date 1331210217 -3600
# Node ID d0611a8ee06d3f34de1c7c51da8571d9e1a668e1
# Parent  e9d74ec1077472f9127c43903811ce3107fc038d
hvmloader: Build IVRS table.

There are 32 ivrs padding entries allocated at the beginning. If a passthru
device has been found from qemu bus, a padding entry will be replaced by a
real device entry. This patch has been tested with both rombios and seabios

Signed-off-by: Wei Wang <wei.wang2@amd.com>

diff -r e9d74ec10774 -r d0611a8ee06d tools/firmware/hvmloader/acpi/acpi2_0.h
--- a/tools/firmware/hvmloader/acpi/acpi2_0.h	Thu Mar 08 13:36:54 2012 +0100
+++ b/tools/firmware/hvmloader/acpi/acpi2_0.h	Thu Mar 08 13:36:57 2012 +0100
@@ -389,6 +389,60 @@ struct acpi_20_madt_intsrcovr {
 #define ACPI_2_0_WAET_REVISION 0x01
 #define ACPI_1_0_FADT_REVISION 0x01
 
+#define IVRS_SIGNATURE
ASCII32(''I'',''V'',''R'',''S'')
+#define IVRS_REVISION           1
+#define IVRS_VASIZE             64
+#define IVRS_PASIZE             52
+#define IVRS_GVASIZE            64
+
+#define IVHD_BLOCK_TYPE         0x10
+#define IVHD_FLAG_HTTUNEN       (1 << 0)
+#define IVHD_FLAG_PASSPW        (1 << 1)
+#define IVHD_FLAG_RESPASSPW     (1 << 2)
+#define IVHD_FLAG_ISOC          (1 << 3)
+#define IVHD_FLAG_IOTLBSUP      (1 << 4)
+#define IVHD_FLAG_COHERENT      (1 << 5)
+#define IVHD_FLAG_PREFSUP       (1 << 6)
+#define IVHD_FLAG_PPRSUP        (1 << 7)
+
+#define IVHD_EFR_GTSUP          (1 << 2)
+#define IVHD_EFR_IASUP          (1 << 5)
+
+#define IVHD_SELECT_4_BYTE      0x2
+
+struct ivrs_ivhd_block
+{
+    uint8_t    type;
+    uint8_t    flags;
+    uint16_t   length;
+    uint16_t   devid;
+    uint16_t   cap_offset;
+    uint64_t   iommu_base_addr;
+    uint16_t   pci_segment;
+    uint16_t   iommu_info;
+    uint32_t   reserved;
+};
+
+/* IVHD 4-byte device entries */
+struct ivrs_ivhd_device
+{
+   uint8_t  type;
+   uint16_t dev_id;
+   uint8_t  flags;
+};
+
+#define PT_DEV_MAX_NR           32
+#define IOMMU_CAP_OFFSET        0x40
+struct acpi_40_ivrs
+{
+    struct acpi_header                      header;
+    uint32_t                                iv_info;
+    uint32_t                                reserved[2];
+    struct ivrs_ivhd_block                  ivhd_block;
+    struct ivrs_ivhd_device                 ivhd_device[PT_DEV_MAX_NR];
+};
+
+
 #pragma pack ()
 
 struct acpi_config {
diff -r e9d74ec10774 -r d0611a8ee06d tools/firmware/hvmloader/acpi/build.c
--- a/tools/firmware/hvmloader/acpi/build.c	Thu Mar 08 13:36:54 2012 +0100
+++ b/tools/firmware/hvmloader/acpi/build.c	Thu Mar 08 13:36:57 2012 +0100
@@ -23,6 +23,8 @@
 #include "ssdt_pm.h"
 #include "../config.h"
 #include "../util.h"
+#include "../hypercall.h"
+#include <xen/hvm/params.h>
 
 #define align16(sz)        (((sz) + 15) & ~15)
 #define fixed_strcpy(d, s) strncpy((d), (s), sizeof(d))
@@ -198,6 +200,87 @@ static struct acpi_20_waet *construct_wa
     return waet;
 }
 
+extern uint32_t ptdev_bdf[PT_DEV_MAX_NR];
+extern uint32_t ptdev_nr;
+extern uint32_t iommu_bdf;
+static struct acpi_40_ivrs* construct_ivrs(void)
+{
+    struct acpi_40_ivrs *ivrs;
+    uint64_t mmio;
+    struct ivrs_ivhd_block *ivhd;
+    struct ivrs_ivhd_device *dev_entry;
+    struct xen_hvm_param p;
+
+    if (ptdev_nr == 0 || iommu_bdf == 0) return NULL;
+
+    ivrs = mem_alloc(sizeof(*ivrs), 16);
+    if (!ivrs) 
+    {
+        printf("unable to build IVRS tables: out of memory\n");
+        return NULL;
+    }
+    memset(ivrs, 0, sizeof(*ivrs));
+
+    /* initialize acpi header */
+    ivrs->header.signature = IVRS_SIGNATURE;
+    ivrs->header.revision = IVRS_REVISION;
+    fixed_strcpy(ivrs->header.oem_id, ACPI_OEM_ID);
+    fixed_strcpy(ivrs->header.oem_table_id, ACPI_OEM_TABLE_ID);
+
+    ivrs->header.oem_revision = ACPI_OEM_REVISION;
+    ivrs->header.creator_id   = ACPI_CREATOR_ID;
+    ivrs->header.creator_revision = ACPI_CREATOR_REVISION;
+
+    ivrs->header.length = sizeof(*ivrs);
+
+    /* initialize IVHD Block */
+    ivhd = &ivrs->ivhd_block;
+    ivrs->iv_info = (IVRS_VASIZE << 15) | (IVRS_PASIZE << 8) |
+                    (IVRS_GVASIZE << 5);
+
+    ivhd->type          = IVHD_BLOCK_TYPE;
+    ivhd->flags         = IVHD_FLAG_PPRSUP | IVHD_FLAG_IOTLBSUP;
+    ivhd->devid         = iommu_bdf;
+    ivhd->cap_offset    = IOMMU_CAP_OFFSET;
+
+    /*reserve 32K IOMMU MMIO space */
+    mmio = virt_to_phys(mem_alloc(0x8000, 0x1000));
+    if (!mmio) 
+    {   
+        printf("unable to reserve iommu mmio pages: out of
memory\n");
+        return NULL;
+    }
+    
+    p.domid = DOMID_SELF;
+    p.index = HVM_PARAM_IOMMU_BASE;
+    p.value = mmio;
+
+    /* Return non-zero if IOMMUv2 hardware is not avaliable */
+    if ( hypercall_hvm_op(HVMOP_set_param, &p) )
+    {
+        printf("unable to set iommu mmio base address\n");
+        return NULL;
+    }
+        
+    ivhd->iommu_base_addr = mmio;
+    ivhd->reserved = IVHD_EFR_IASUP | IVHD_EFR_GTSUP;
+
+    /* Build IVHD device entries */
+    dev_entry = ivrs->ivhd_device;
+    for ( int i = 0; i < ptdev_nr; i++ )
+    {
+        dev_entry[i].type   = IVHD_SELECT_4_BYTE;
+        dev_entry[i].dev_id = ptdev_bdf[i];
+        dev_entry[i].flags  = 0;
+    }
+
+    ivhd->length = sizeof(*ivhd) + sizeof(*dev_entry) * PT_DEV_MAX_NR;
+    set_checksum(ivrs, offsetof(struct acpi_header, checksum), 
+                 ivrs->header.length);
+
+    return ivrs;
+}
+
 static int construct_secondary_tables(unsigned long *table_ptrs,
                                       struct acpi_info *info)
 {
@@ -206,6 +289,7 @@ static int construct_secondary_tables(un
     struct acpi_20_hpet *hpet;
     struct acpi_20_waet *waet;
     struct acpi_20_tcpa *tcpa;
+    struct acpi_40_ivrs *ivrs;
     unsigned char *ssdt;
     static const uint16_t tis_signature[] = {0x0001, 0x0001, 0x0001};
     uint16_t *tis_hdr;
@@ -293,6 +377,13 @@ static int construct_secondary_tables(un
         }
     }
 
+    if ( !strncmp(xenstore_read("guest_iommu", "1"),
"1", 1) )
+    {
+        ivrs = construct_ivrs();
+        if ( ivrs != NULL )
+            table_ptrs[nr_tables++] = (unsigned long)ivrs;
+    }
+
     table_ptrs[nr_tables] = 0;
     return nr_tables;
 }
diff -r e9d74ec10774 -r d0611a8ee06d tools/firmware/hvmloader/pci.c
--- a/tools/firmware/hvmloader/pci.c	Thu Mar 08 13:36:54 2012 +0100
+++ b/tools/firmware/hvmloader/pci.c	Thu Mar 08 13:36:57 2012 +0100
@@ -34,11 +34,17 @@ unsigned long pci_mem_end = PCI_MEM_END;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+/* support up to 32 passthrough devices */
+#define PT_DEV_MAX_NR           32
+uint32_t ptdev_bdf[PT_DEV_MAX_NR];
+uint32_t ptdev_nr;
+uint32_t iommu_bdf = 0;
+
 void pci_setup(void)
 {
     uint32_t base, devfn, bar_reg, bar_data, bar_sz, cmd, mmio_total = 0;
     uint32_t vga_devfn = 256;
-    uint16_t class, vendor_id, device_id;
+    uint16_t class, vendor_id, device_id, sub_vendor_id;
     unsigned int bar, pin, link, isa_irq;
 
     /* Resources assignable to PCI devices via BARs. */
@@ -72,12 +78,34 @@ void pci_setup(void)
         class     = pci_readw(devfn, PCI_CLASS_DEVICE);
         vendor_id = pci_readw(devfn, PCI_VENDOR_ID);
         device_id = pci_readw(devfn, PCI_DEVICE_ID);
+        sub_vendor_id = pci_readw(devfn, PCI_SUBSYSTEM_VENDOR_ID);
+
         if ( (vendor_id == 0xffff) && (device_id == 0xffff) )
             continue;
 
         ASSERT((devfn != PCI_ISA_DEVFN) ||
                ((vendor_id == 0x8086) && (device_id == 0x7000)));
 
+        /* Found amd iommu device. */
+        if ( class == 0x0806 && vendor_id == 0x1022 )
+        {
+            iommu_bdf = devfn;
+            continue;
+        }
+        /* IVRS: Detecting passthrough devices.
+         * sub_vendor_id != citrix && sub_vendor_id != qemu */
+        if ( sub_vendor_id != 0x5853 && sub_vendor_id != 0x1af4 )
+        {
+            /* found a passthru device */
+            if ( ptdev_nr < PT_DEV_MAX_NR )
+            {
+                ptdev_bdf[ptdev_nr] = devfn;
+                ptdev_nr++;
+            }
+            else
+                printf("Number of passthru devices > PT_DEV_MAX_NR
\n");
+        }
+
         switch ( class )
         {
         case 0x0300:

Wei Wang

2012-Mar-08 13:21 UTC

head link

[PATCH 4 of 6 V6] libxc: add wrappers for new hypercalls

# HG changeset patch
# User Wei Wang <wei.wang2@amd.com>
# Date 1331210219 -3600
# Node ID d3c4ac0b7cc771ce9639cf3f9f14c0e10d85784d
# Parent  d0611a8ee06d3f34de1c7c51da8571d9e1a668e1
libxc: add wrappers for new hypercalls

Please see patch 1 for hypercall description.

Signed-off-by: Wei Wang <wei.wang2@amd.com>
Acked-by: Ian Jackson<ian.jackson@eu.citrix.com>

diff -r d0611a8ee06d -r d3c4ac0b7cc7 tools/libxc/xc_domain.c
--- a/tools/libxc/xc_domain.c	Thu Mar 08 13:36:57 2012 +0100
+++ b/tools/libxc/xc_domain.c	Thu Mar 08 13:36:59 2012 +0100
@@ -1352,6 +1352,59 @@ int xc_domain_bind_pt_isa_irq(
                                   PT_IRQ_TYPE_ISA, 0, 0, 0, machine_irq));
 }
 
+int xc_domain_update_iommu_msi(
+    xc_interface *xch,
+    uint32_t domid,
+    uint8_t vector,
+    uint8_t dest,
+    uint8_t dest_mode,
+    uint8_t delivery_mode,
+    uint8_t trig_mode)
+{
+    int rc;
+    DECLARE_DOMCTL;
+    xen_domctl_guest_iommu_op_t * iommu_op;
+
+    domctl.cmd = XEN_DOMCTL_guest_iommu_op;
+    domctl.domain = (domid_t)domid;
+
+    iommu_op = &(domctl.u.guest_iommu_op);
+    iommu_op->op = XEN_DOMCTL_GUEST_IOMMU_OP_SET_MSI;
+    iommu_op->u.msi.vector = vector;
+    iommu_op->u.msi.dest = dest;
+    iommu_op->u.msi.dest_mode = dest_mode;
+    iommu_op->u.msi.delivery_mode = delivery_mode;
+    iommu_op->u.msi.trig_mode = trig_mode;
+
+    rc = do_domctl(xch, &domctl);
+    return rc;
+}
+
+int xc_domain_bind_pt_bdf(xc_interface *xch,
+    uint32_t domid,
+    uint16_t gseg,
+    uint16_t gbdf,
+    uint16_t mseg,
+    uint16_t mbdf)
+{
+    int rc;
+    DECLARE_DOMCTL;
+    xen_domctl_guest_iommu_op_t * guest_op;
+
+    domctl.cmd = XEN_DOMCTL_guest_iommu_op;
+    domctl.domain = (domid_t)domid;
+
+    guest_op = &(domctl.u.guest_iommu_op);
+    guest_op->op = XEN_DOMCTL_GUEST_IOMMU_OP_BIND_BDF;
+    guest_op->u.bdf_bind.g_seg = gseg;
+    guest_op->u.bdf_bind.g_bdf = gbdf;
+    guest_op->u.bdf_bind.m_seg = mseg;
+    guest_op->u.bdf_bind.m_bdf = mbdf;
+
+    rc = do_domctl(xch, &domctl);
+    return rc;
+}
+
 int xc_domain_memory_mapping(
     xc_interface *xch,
     uint32_t domid,
diff -r d0611a8ee06d -r d3c4ac0b7cc7 tools/libxc/xenctrl.h
--- a/tools/libxc/xenctrl.h	Thu Mar 08 13:36:57 2012 +0100
+++ b/tools/libxc/xenctrl.h	Thu Mar 08 13:36:59 2012 +0100
@@ -1734,6 +1734,21 @@ int xc_domain_bind_pt_isa_irq(xc_interfa
                               uint32_t domid,
                               uint8_t machine_irq);
 
+int xc_domain_bind_pt_bdf(xc_interface *xch,
+                          uint32_t domid,
+                          uint16_t gseg,
+                          uint16_t gbdf,
+                          uint16_t mseg, 
+                          uint16_t mbdf);
+
+int xc_domain_update_iommu_msi(xc_interface *xch,
+                               uint32_t domid,
+                               uint8_t vector,
+                               uint8_t dest,
+                               uint8_t dest_mode,
+                               uint8_t delivery_mode,
+                               uint8_t trig_mode);
+
 int xc_domain_set_machine_address_size(xc_interface *xch,
 				       uint32_t domid,
 				       unsigned int width);

Wei Wang

2012-Mar-08 13:21 UTC

head link

[PATCH 5 of 6 V6] libxl: bind virtual bdf to physical bdf after device assignment

# HG changeset patch
# User Wei Wang <wei.wang2@amd.com>
# Date 1331210222 -3600
# Node ID 0a1de2dea27370d71d2572869d363d9e5833648e
# Parent  d3c4ac0b7cc771ce9639cf3f9f14c0e10d85784d
libxl: bind virtual bdf to physical bdf after device assignment

Signed-off-by: Wei Wang <wei.wang2@amd.com>
Acked-by: Ian Jackson<ian.jackson@eu.citrix.com>

diff -r d3c4ac0b7cc7 -r 0a1de2dea273 tools/libxl/libxl_pci.c
--- a/tools/libxl/libxl_pci.c	Thu Mar 08 13:36:59 2012 +0100
+++ b/tools/libxl/libxl_pci.c	Thu Mar 08 13:37:02 2012 +0100
@@ -720,6 +720,13 @@ out:
             LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc,
"xc_assign_device failed");
             return ERROR_FAIL;
         }
+        if (LIBXL__DOMAIN_IS_TYPE(gc,  domid, HVM)) {
+            rc = xc_domain_bind_pt_bdf(ctx->xch, domid, 0,
pcidev->vdevfn, pcidev->domain, pcidev_encode_bdf(pcidev));
+            if ( rc ) {
+                LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc,
"xc_domain_bind_pt_bdf failed");
+                return ERROR_FAIL;
+            }
+        }
     }
 
     if (!starting)

Wei Wang

2012-Mar-08 13:21 UTC

head link

[PATCH 6 of 6 V6] libxl: Introduce a new guest config file parameter

# HG changeset patch
# User Wei Wang <wei.wang2@amd.com>
# Date 1331210225 -3600
# Node ID 4c16ebeae0adace4493299b2f6d9c74ce8e6f889
# Parent  0a1de2dea27370d71d2572869d363d9e5833648e
libxl: Introduce a new guest config file parameter

Use guest_iommu = {1,0} to enable or disable guest iommu emulation.
Default value is 0. Regression tests have been done to make sure
it does not break non-iommuv2 systems.

Signed-off-by: Wei Wang <wei.wang2@amd.com>
Acked-by: Ian Jackson<ian.jackson@eu.citrix.com>

diff -r 0a1de2dea273 -r 4c16ebeae0ad docs/man/xl.cfg.pod.5
--- a/docs/man/xl.cfg.pod.5	Thu Mar 08 13:37:02 2012 +0100
+++ b/docs/man/xl.cfg.pod.5	Thu Mar 08 13:37:05 2012 +0100
@@ -851,6 +851,11 @@ certainly belong in a more appropriate s
 
 Enable graphics device PCI passthrough. XXX which device is passed through ?
 
+=item B<guest_iommu=BOOLEAN>
+
+Enable a virtual iommu device for hvm guest. It should be enabled to 
+passthrough AMD HD7900 series GPGPU.
+
 =item B<nomigrate=BOOLEAN>
 
 Disable migration of this domain.  This enables certain other features
diff -r 0a1de2dea273 -r 4c16ebeae0ad tools/libxl/libxl_create.c
--- a/tools/libxl/libxl_create.c	Thu Mar 08 13:37:02 2012 +0100
+++ b/tools/libxl/libxl_create.c	Thu Mar 08 13:37:05 2012 +0100
@@ -146,6 +146,7 @@ int libxl__domain_build_info_setdefault(
         libxl_defbool_setdefault(&b_info->u.hvm.hpet,              
true);
         libxl_defbool_setdefault(&b_info->u.hvm.vpt_align,         
true);
         libxl_defbool_setdefault(&b_info->u.hvm.nested_hvm,        
false);
+        libxl_defbool_setdefault(&b_info->u.hvm.guest_iommu,       
false);
         libxl_defbool_setdefault(&b_info->u.hvm.incr_generationid, 
false);
         libxl_defbool_setdefault(&b_info->u.hvm.usb,               
false);
         libxl_defbool_setdefault(&b_info->u.hvm.xen_platform_pci,  
true);
@@ -237,13 +238,15 @@ int libxl__domain_build(libxl__gc *gc,
         vments[4] = "start_time";
         vments[5] = libxl__sprintf(gc, "%lu.%02d",
start_time.tv_sec,(int)start_time.tv_usec/10000);
 
-        localents = libxl__calloc(gc, 7, sizeof(char *));
+        localents = libxl__calloc(gc, 9, sizeof(char *));
         localents[0] = "platform/acpi";
         localents[1] = libxl_defbool_val(info->u.hvm.acpi) ? "1" :
"0";
         localents[2] = "platform/acpi_s3";
         localents[3] = libxl_defbool_val(info->u.hvm.acpi_s3) ?
"1" : "0";
         localents[4] = "platform/acpi_s4";
         localents[5] = libxl_defbool_val(info->u.hvm.acpi_s4) ?
"1" : "0";
+        localents[6] = "guest_iommu";
+        localents[7] = libxl_defbool_val(info->u.hvm.guest_iommu) ?
"1" : "0";
 
         break;
     case LIBXL_DOMAIN_TYPE_PV:
diff -r 0a1de2dea273 -r 4c16ebeae0ad tools/libxl/libxl_types.idl
--- a/tools/libxl/libxl_types.idl	Thu Mar 08 13:37:02 2012 +0100
+++ b/tools/libxl/libxl_types.idl	Thu Mar 08 13:37:05 2012 +0100
@@ -268,6 +268,7 @@ libxl_domain_build_info = Struct("domain
                                        ("vpt_align",       
libxl_defbool),
                                        ("timer_mode",      
libxl_timer_mode),
                                        ("nested_hvm",      
libxl_defbool),
+                                       ("guest_iommu",     
libxl_defbool),
                                       
("incr_generationid",libxl_defbool),
                                        ("nographic",       
libxl_defbool),
                                        ("stdvga",          
libxl_defbool),
diff -r 0a1de2dea273 -r 4c16ebeae0ad tools/libxl/xl_cmdimpl.c
--- a/tools/libxl/xl_cmdimpl.c	Thu Mar 08 13:37:02 2012 +0100
+++ b/tools/libxl/xl_cmdimpl.c	Thu Mar 08 13:37:05 2012 +0100
@@ -749,6 +749,7 @@ static void parse_config_data(const char
         }
 
         xlu_cfg_get_defbool(config, "nestedhvm",
&b_info->u.hvm.nested_hvm, 0);
+        xlu_cfg_get_defbool(config, "guest_iommu",
&b_info->u.hvm.guest_iommu, 0);
         break;
     case LIBXL_DOMAIN_TYPE_PV:
     {

Jan Beulich

2012-Mar-08 13:35 UTC

head link

Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc

>>> On 08.03.12 at 14:21, Wei Wang <wei.wang2@amd.com> wrote:
> # HG changeset patch
> # User Wei Wang <wei.wang2@amd.com>
> # Date 1331210211 -3600
> # Node ID 810296995a893c1bec898a366f08520fbae755fb
> # Parent  f611200469159a960c3f0b975aa0f91ec768e29f
> amd iommu: Add 2 hypercalls for libxc
> 
> iommu_set_msi: used by qemu to inform hypervisor iommu vector number in
> guest space. Hypervisor needs this vector to inject msi into guest after
> writing PPR logs back to guest.
> 
> iommu_bind_bdf: used by xl to bind virtual bdf to machine bdf for passthru
> devices. IOMMU emulator receives iommu cmd from guest OS and then forwards
> them to host iommu. Virtual device ids in guest iommu commands must be
> converted into physical ids before sending them to real hardware.
So patch 5 uses the latter one, but the former one is still dead code?

Jan

Wei Wang

2012-Mar-08 14:18 UTC

head link

Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc

On 03/08/2012 02:35 PM, Jan Beulich wrote:>>>> On 08.03.12 at 14:21, Wei Wang<wei.wang2@amd.com>  wrote:
>> # HG changeset patch
>> # User Wei Wang<wei.wang2@amd.com>
>> # Date 1331210211 -3600
>> # Node ID 810296995a893c1bec898a366f08520fbae755fb
>> # Parent  f611200469159a960c3f0b975aa0f91ec768e29f
>> amd iommu: Add 2 hypercalls for libxc
>>
>> iommu_set_msi: used by qemu to inform hypervisor iommu vector number in
>> guest space. Hypervisor needs this vector to inject msi into guest
after
>> writing PPR logs back to guest.
>>
>> iommu_bind_bdf: used by xl to bind virtual bdf to machine bdf for
passthru
>> devices. IOMMU emulator receives iommu cmd from guest OS and then
forwards
>> them to host iommu. Virtual device ids in guest iommu commands must be
>> converted into physical ids before sending them to real hardware.
>
> So patch 5 uses the latter one, but the former one is still dead code?
>
> Jan
>
>
Hi Jan,
The former one will be used by qemu. But Ian said we would not add new 
features into qemu traditional,,So I am waiting for pci passthru patches 
for upstream qemu being accepted and then I coudl send out my qemu 
patch. I also had a prove-of-concept patch that was based on the old 
qemu. I could rebase and resend it if necessary.

http://lists.xen.org/archives/html/xen-devel/2011-12/msg01269.html

thanks,

Wei

Zhang, Xiantao

2012-Mar-08 14:22 UTC

head link

Re: [PATCH 3 of 6 V6] hvmloader: Build IVRS table

I think this IVRS table should be vendor-specific, and we should have the
mechanism make it only work for AMD IOMMU.  This is because Intel also has the
similar support in next generation VT-d,  DMAR table should be built also when
enable virtual VT-d for the guest.   I suggest this table should be only built
when guest running on AMD''s platforms.
Thanks!
Xiantao 
> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of Wei Wang
> Sent: Thursday, March 08, 2012 9:22 PM
> To: Ian.Jackson@eu.citrix.com; Ian.Campbell@citrix.com; JBeulich@suse.com;
> keir@xen.org
> Cc: xen-devel@lists.xensource.com
> Subject: [Xen-devel] [PATCH 3 of 6 V6] hvmloader: Build IVRS table
> 
> # HG changeset patch
> # User Wei Wang <wei.wang2@amd.com>
> # Date 1331210217 -3600
> # Node ID d0611a8ee06d3f34de1c7c51da8571d9e1a668e1
> # Parent  e9d74ec1077472f9127c43903811ce3107fc038d
> hvmloader: Build IVRS table.
> 
> There are 32 ivrs padding entries allocated at the beginning. If a passthru
> device has been found from qemu bus, a padding entry will be replaced by a
> real device entry. This patch has been tested with both rombios and seabios
> 
> Signed-off-by: Wei Wang <wei.wang2@amd.com>
> 
> diff -r e9d74ec10774 -r d0611a8ee06d
> tools/firmware/hvmloader/acpi/acpi2_0.h
> --- a/tools/firmware/hvmloader/acpi/acpi2_0.h	Thu Mar 08 13:36:54
> 2012 +0100
> +++ b/tools/firmware/hvmloader/acpi/acpi2_0.h	Thu Mar 08 13:36:57
> 2012 +0100
> @@ -389,6 +389,60 @@ struct acpi_20_madt_intsrcovr {  #define
> ACPI_2_0_WAET_REVISION 0x01  #define ACPI_1_0_FADT_REVISION 0x01
> 
> +#define IVRS_SIGNATURE
ASCII32(''I'',''V'',''R'',''S'')
> +#define IVRS_REVISION           1
> +#define IVRS_VASIZE             64
> +#define IVRS_PASIZE             52
> +#define IVRS_GVASIZE            64
> +
> +#define IVHD_BLOCK_TYPE         0x10
> +#define IVHD_FLAG_HTTUNEN       (1 << 0)
> +#define IVHD_FLAG_PASSPW        (1 << 1)
> +#define IVHD_FLAG_RESPASSPW     (1 << 2)
> +#define IVHD_FLAG_ISOC          (1 << 3)
> +#define IVHD_FLAG_IOTLBSUP      (1 << 4)
> +#define IVHD_FLAG_COHERENT      (1 << 5)
> +#define IVHD_FLAG_PREFSUP       (1 << 6)
> +#define IVHD_FLAG_PPRSUP        (1 << 7)
> +
> +#define IVHD_EFR_GTSUP          (1 << 2)
> +#define IVHD_EFR_IASUP          (1 << 5)
> +
> +#define IVHD_SELECT_4_BYTE      0x2
> +
> +struct ivrs_ivhd_block
> +{
> +    uint8_t    type;
> +    uint8_t    flags;
> +    uint16_t   length;
> +    uint16_t   devid;
> +    uint16_t   cap_offset;
> +    uint64_t   iommu_base_addr;
> +    uint16_t   pci_segment;
> +    uint16_t   iommu_info;
> +    uint32_t   reserved;
> +};
> +
> +/* IVHD 4-byte device entries */
> +struct ivrs_ivhd_device
> +{
> +   uint8_t  type;
> +   uint16_t dev_id;
> +   uint8_t  flags;
> +};
> +
> +#define PT_DEV_MAX_NR           32
> +#define IOMMU_CAP_OFFSET        0x40
> +struct acpi_40_ivrs
> +{
> +    struct acpi_header                      header;
> +    uint32_t                                iv_info;
> +    uint32_t                                reserved[2];
> +    struct ivrs_ivhd_block                  ivhd_block;
> +    struct ivrs_ivhd_device                 ivhd_device[PT_DEV_MAX_NR];
> +};
> +
> +
>  #pragma pack ()
> 
>  struct acpi_config {
> diff -r e9d74ec10774 -r d0611a8ee06d tools/firmware/hvmloader/acpi/build.c
> --- a/tools/firmware/hvmloader/acpi/build.c	Thu Mar 08 13:36:54 2012
> +0100
> +++ b/tools/firmware/hvmloader/acpi/build.c	Thu Mar 08 13:36:57 2012
> +0100
> @@ -23,6 +23,8 @@
>  #include "ssdt_pm.h"
>  #include "../config.h"
>  #include "../util.h"
> +#include "../hypercall.h"
> +#include <xen/hvm/params.h>
> 
>  #define align16(sz)        (((sz) + 15) & ~15)
>  #define fixed_strcpy(d, s) strncpy((d), (s), sizeof(d)) @@ -198,6 +200,87
@@
> static struct acpi_20_waet *construct_wa
>      return waet;
>  }
> 
> +extern uint32_t ptdev_bdf[PT_DEV_MAX_NR]; extern uint32_t ptdev_nr;
> +extern uint32_t iommu_bdf; static struct acpi_40_ivrs*
> +construct_ivrs(void) {
> +    struct acpi_40_ivrs *ivrs;
> +    uint64_t mmio;
> +    struct ivrs_ivhd_block *ivhd;
> +    struct ivrs_ivhd_device *dev_entry;
> +    struct xen_hvm_param p;
> +
> +    if (ptdev_nr == 0 || iommu_bdf == 0) return NULL;
> +
> +    ivrs = mem_alloc(sizeof(*ivrs), 16);
> +    if (!ivrs)
> +    {
> +        printf("unable to build IVRS tables: out of memory\n");
> +        return NULL;
> +    }
> +    memset(ivrs, 0, sizeof(*ivrs));
> +
> +    /* initialize acpi header */
> +    ivrs->header.signature = IVRS_SIGNATURE;
> +    ivrs->header.revision = IVRS_REVISION;
> +    fixed_strcpy(ivrs->header.oem_id, ACPI_OEM_ID);
> +    fixed_strcpy(ivrs->header.oem_table_id, ACPI_OEM_TABLE_ID);
> +
> +    ivrs->header.oem_revision = ACPI_OEM_REVISION;
> +    ivrs->header.creator_id   = ACPI_CREATOR_ID;
> +    ivrs->header.creator_revision = ACPI_CREATOR_REVISION;
> +
> +    ivrs->header.length = sizeof(*ivrs);
> +
> +    /* initialize IVHD Block */
> +    ivhd = &ivrs->ivhd_block;
> +    ivrs->iv_info = (IVRS_VASIZE << 15) | (IVRS_PASIZE <<
8) |
> +                    (IVRS_GVASIZE << 5);
> +
> +    ivhd->type          = IVHD_BLOCK_TYPE;
> +    ivhd->flags         = IVHD_FLAG_PPRSUP | IVHD_FLAG_IOTLBSUP;
> +    ivhd->devid         = iommu_bdf;
> +    ivhd->cap_offset    = IOMMU_CAP_OFFSET;
> +
> +    /*reserve 32K IOMMU MMIO space */
> +    mmio = virt_to_phys(mem_alloc(0x8000, 0x1000));
> +    if (!mmio)
> +    {
> +        printf("unable to reserve iommu mmio pages: out of
memory\n");
> +        return NULL;
> +    }
> +
> +    p.domid = DOMID_SELF;
> +    p.index = HVM_PARAM_IOMMU_BASE;
> +    p.value = mmio;
> +
> +    /* Return non-zero if IOMMUv2 hardware is not avaliable */
> +    if ( hypercall_hvm_op(HVMOP_set_param, &p) )
> +    {
> +        printf("unable to set iommu mmio base address\n");
> +        return NULL;
> +    }
> +
> +    ivhd->iommu_base_addr = mmio;
> +    ivhd->reserved = IVHD_EFR_IASUP | IVHD_EFR_GTSUP;
> +
> +    /* Build IVHD device entries */
> +    dev_entry = ivrs->ivhd_device;
> +    for ( int i = 0; i < ptdev_nr; i++ )
> +    {
> +        dev_entry[i].type   = IVHD_SELECT_4_BYTE;
> +        dev_entry[i].dev_id = ptdev_bdf[i];
> +        dev_entry[i].flags  = 0;
> +    }
> +
> +    ivhd->length = sizeof(*ivhd) + sizeof(*dev_entry) * PT_DEV_MAX_NR;
> +    set_checksum(ivrs, offsetof(struct acpi_header, checksum),
> +                 ivrs->header.length);
> +
> +    return ivrs;
> +}
> +
>  static int construct_secondary_tables(unsigned long *table_ptrs,
>                                        struct acpi_info *info)  { @@ -206,6
+289,7 @@ static int
> construct_secondary_tables(un
>      struct acpi_20_hpet *hpet;
>      struct acpi_20_waet *waet;
>      struct acpi_20_tcpa *tcpa;
> +    struct acpi_40_ivrs *ivrs;
>      unsigned char *ssdt;
>      static const uint16_t tis_signature[] = {0x0001, 0x0001, 0x0001};
>      uint16_t *tis_hdr;
> @@ -293,6 +377,13 @@ static int construct_secondary_tables(un
>          }
>      }
> 
> +    if ( !strncmp(xenstore_read("guest_iommu", "1"),
"1", 1) )
> +    {
> +        ivrs = construct_ivrs();
> +        if ( ivrs != NULL )
> +            table_ptrs[nr_tables++] = (unsigned long)ivrs;
> +    }
> +
>      table_ptrs[nr_tables] = 0;
>      return nr_tables;
>  }
> diff -r e9d74ec10774 -r d0611a8ee06d tools/firmware/hvmloader/pci.c
> --- a/tools/firmware/hvmloader/pci.c	Thu Mar 08 13:36:54 2012 +0100
> +++ b/tools/firmware/hvmloader/pci.c	Thu Mar 08 13:36:57 2012 +0100
> @@ -34,11 +34,17 @@ unsigned long pci_mem_end = PCI_MEM_END;
> enum virtual_vga virtual_vga = VGA_none;  unsigned long
> igd_opregion_pgbase = 0;
> 
> +/* support up to 32 passthrough devices */
> +#define PT_DEV_MAX_NR           32
> +uint32_t ptdev_bdf[PT_DEV_MAX_NR];
> +uint32_t ptdev_nr;
> +uint32_t iommu_bdf = 0;
> +
>  void pci_setup(void)
>  {
>      uint32_t base, devfn, bar_reg, bar_data, bar_sz, cmd, mmio_total = 0;
>      uint32_t vga_devfn = 256;
> -    uint16_t class, vendor_id, device_id;
> +    uint16_t class, vendor_id, device_id, sub_vendor_id;
>      unsigned int bar, pin, link, isa_irq;
> 
>      /* Resources assignable to PCI devices via BARs. */ @@ -72,12 +78,34
@@
> void pci_setup(void)
>          class     = pci_readw(devfn, PCI_CLASS_DEVICE);
>          vendor_id = pci_readw(devfn, PCI_VENDOR_ID);
>          device_id = pci_readw(devfn, PCI_DEVICE_ID);
> +        sub_vendor_id = pci_readw(devfn, PCI_SUBSYSTEM_VENDOR_ID);
> +
>          if ( (vendor_id == 0xffff) && (device_id == 0xffff) )
>              continue;
> 
>          ASSERT((devfn != PCI_ISA_DEVFN) ||
>                 ((vendor_id == 0x8086) && (device_id == 0x7000)));
> 
> +        /* Found amd iommu device. */
> +        if ( class == 0x0806 && vendor_id == 0x1022 )
> +        {
> +            iommu_bdf = devfn;
> +            continue;
> +        }
> +        /* IVRS: Detecting passthrough devices.
> +         * sub_vendor_id != citrix && sub_vendor_id != qemu */
> +        if ( sub_vendor_id != 0x5853 && sub_vendor_id != 0x1af4 )
> +        {
> +            /* found a passthru device */
> +            if ( ptdev_nr < PT_DEV_MAX_NR )
> +            {
> +                ptdev_bdf[ptdev_nr] = devfn;
> +                ptdev_nr++;
> +            }
> +            else
> +                printf("Number of passthru devices > PT_DEV_MAX_NR
\n");
> +        }
> +
>          switch ( class )
>          {
>          case 0x0300:
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Jan Beulich

2012-Mar-08 14:41 UTC

head link

Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc

>>> On 08.03.12 at 15:18, Wei Wang <wei.wang2@amd.com> wrote:
> On 03/08/2012 02:35 PM, Jan Beulich wrote:
>>>>> On 08.03.12 at 14:21, Wei Wang<wei.wang2@amd.com> 
wrote:
>>> # HG changeset patch
>>> # User Wei Wang<wei.wang2@amd.com>
>>> # Date 1331210211 -3600
>>> # Node ID 810296995a893c1bec898a366f08520fbae755fb
>>> # Parent  f611200469159a960c3f0b975aa0f91ec768e29f
>>> amd iommu: Add 2 hypercalls for libxc
>>>
>>> iommu_set_msi: used by qemu to inform hypervisor iommu vector
number in
>>> guest space. Hypervisor needs this vector to inject msi into guest
after
>>> writing PPR logs back to guest.
>>>
>>> iommu_bind_bdf: used by xl to bind virtual bdf to machine bdf for
passthru
>>> devices. IOMMU emulator receives iommu cmd from guest OS and then
forwards
>>> them to host iommu. Virtual device ids in guest iommu commands must
be
>>> converted into physical ids before sending them to real hardware.
>>
>> So patch 5 uses the latter one, but the former one is still dead code?
> 
> The former one will be used by qemu. But Ian said we would not add new 
> features into qemu traditional,,So I am waiting for pci passthru patches 
> for upstream qemu being accepted and then I coudl send out my qemu 
> patch. I also had a prove-of-concept patch that was based on the old 
> qemu. I could rebase and resend it if necessary.
That sounds fine to me then.

Jan

Wei Wang

2012-Mar-08 14:49 UTC

head link

Re: [PATCH 3 of 6 V6] hvmloader: Build IVRS table

On 03/08/2012 03:22 PM, Zhang, Xiantao wrote:> I think this IVRS table should be vendor-specific, and we should have the
mechanism make it only work for AMD IOMMU.  This is because Intel also has the
similar support in next generation VT-d,  DMAR table should be built also when
enable virtual VT-d for the guest.   I suggest this table should be only built
when guest running on AMD''s platforms.
> Thanks!
> Xiantao
Hi Xianto
Thanks for reviewing it. Actually construct_ivrs() will invoke a 
hypercall guest_iommu_set_base() and it will fail on non-iommuv2 systems
including vtd. So IVRS can be avoided on Intel systems. But I am also 
thinking that maybe we should let user chose different iommu hw at the 
beginning. For example, use guest_iommu={vtd, amd, 0} to distinguish 
different iommu hardware?

Thanks,
Wei
>> -----Original Message-----
>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>> bounces@lists.xen.org] On Behalf Of Wei Wang
>> Sent: Thursday, March 08, 2012 9:22 PM
>> To: Ian.Jackson@eu.citrix.com; Ian.Campbell@citrix.com;
JBeulich@suse.com;
>> keir@xen.org
>> Cc: xen-devel@lists.xensource.com
>> Subject: [Xen-devel] [PATCH 3 of 6 V6] hvmloader: Build IVRS table
>>
>> # HG changeset patch
>> # User Wei Wang<wei.wang2@amd.com>
>> # Date 1331210217 -3600
>> # Node ID d0611a8ee06d3f34de1c7c51da8571d9e1a668e1
>> # Parent  e9d74ec1077472f9127c43903811ce3107fc038d
>> hvmloader: Build IVRS table.
>>
>> There are 32 ivrs padding entries allocated at the beginning. If a
passthru
>> device has been found from qemu bus, a padding entry will be replaced
by a
>> real device entry. This patch has been tested with both rombios and
seabios
>>
>> Signed-off-by: Wei Wang<wei.wang2@amd.com>
>>
>> diff -r e9d74ec10774 -r d0611a8ee06d
>> tools/firmware/hvmloader/acpi/acpi2_0.h
>> --- a/tools/firmware/hvmloader/acpi/acpi2_0.h	Thu Mar 08 13:36:54
>> 2012 +0100
>> +++ b/tools/firmware/hvmloader/acpi/acpi2_0.h	Thu Mar 08 13:36:57
>> 2012 +0100
>> @@ -389,6 +389,60 @@ struct acpi_20_madt_intsrcovr {  #define
>> ACPI_2_0_WAET_REVISION 0x01  #define ACPI_1_0_FADT_REVISION 0x01
>>
>> +#define IVRS_SIGNATURE
ASCII32(''I'',''V'',''R'',''S'')
>> +#define IVRS_REVISION           1
>> +#define IVRS_VASIZE             64
>> +#define IVRS_PASIZE             52
>> +#define IVRS_GVASIZE            64
>> +
>> +#define IVHD_BLOCK_TYPE         0x10
>> +#define IVHD_FLAG_HTTUNEN       (1<<  0)
>> +#define IVHD_FLAG_PASSPW        (1<<  1)
>> +#define IVHD_FLAG_RESPASSPW     (1<<  2)
>> +#define IVHD_FLAG_ISOC          (1<<  3)
>> +#define IVHD_FLAG_IOTLBSUP      (1<<  4)
>> +#define IVHD_FLAG_COHERENT      (1<<  5)
>> +#define IVHD_FLAG_PREFSUP       (1<<  6)
>> +#define IVHD_FLAG_PPRSUP        (1<<  7)
>> +
>> +#define IVHD_EFR_GTSUP          (1<<  2)
>> +#define IVHD_EFR_IASUP          (1<<  5)
>> +
>> +#define IVHD_SELECT_4_BYTE      0x2
>> +
>> +struct ivrs_ivhd_block
>> +{
>> +    uint8_t    type;
>> +    uint8_t    flags;
>> +    uint16_t   length;
>> +    uint16_t   devid;
>> +    uint16_t   cap_offset;
>> +    uint64_t   iommu_base_addr;
>> +    uint16_t   pci_segment;
>> +    uint16_t   iommu_info;
>> +    uint32_t   reserved;
>> +};
>> +
>> +/* IVHD 4-byte device entries */
>> +struct ivrs_ivhd_device
>> +{
>> +   uint8_t  type;
>> +   uint16_t dev_id;
>> +   uint8_t  flags;
>> +};
>> +
>> +#define PT_DEV_MAX_NR           32
>> +#define IOMMU_CAP_OFFSET        0x40
>> +struct acpi_40_ivrs
>> +{
>> +    struct acpi_header                      header;
>> +    uint32_t                                iv_info;
>> +    uint32_t                                reserved[2];
>> +    struct ivrs_ivhd_block                  ivhd_block;
>> +    struct ivrs_ivhd_device                
ivhd_device[PT_DEV_MAX_NR];
>> +};
>> +
>> +
>>   #pragma pack ()
>>
>>   struct acpi_config {
>> diff -r e9d74ec10774 -r d0611a8ee06d
tools/firmware/hvmloader/acpi/build.c
>> --- a/tools/firmware/hvmloader/acpi/build.c	Thu Mar 08 13:36:54 2012
>> +0100
>> +++ b/tools/firmware/hvmloader/acpi/build.c	Thu Mar 08 13:36:57 2012
>> +0100
>> @@ -23,6 +23,8 @@
>>   #include "ssdt_pm.h"
>>   #include "../config.h"
>>   #include "../util.h"
>> +#include "../hypercall.h"
>> +#include<xen/hvm/params.h>
>>
>>   #define align16(sz)        (((sz) + 15)&  ~15)
>>   #define fixed_strcpy(d, s) strncpy((d), (s), sizeof(d)) @@ -198,6
+200,87 @@
>> static struct acpi_20_waet *construct_wa
>>       return waet;
>>   }
>>
>> +extern uint32_t ptdev_bdf[PT_DEV_MAX_NR]; extern uint32_t ptdev_nr;
>> +extern uint32_t iommu_bdf; static struct acpi_40_ivrs*
>> +construct_ivrs(void) {
>> +    struct acpi_40_ivrs *ivrs;
>> +    uint64_t mmio;
>> +    struct ivrs_ivhd_block *ivhd;
>> +    struct ivrs_ivhd_device *dev_entry;
>> +    struct xen_hvm_param p;
>> +
>> +    if (ptdev_nr == 0 || iommu_bdf == 0) return NULL;
>> +
>> +    ivrs = mem_alloc(sizeof(*ivrs), 16);
>> +    if (!ivrs)
>> +    {
>> +        printf("unable to build IVRS tables: out of
memory\n");
>> +        return NULL;
>> +    }
>> +    memset(ivrs, 0, sizeof(*ivrs));
>> +
>> +    /* initialize acpi header */
>> +    ivrs->header.signature = IVRS_SIGNATURE;
>> +    ivrs->header.revision = IVRS_REVISION;
>> +    fixed_strcpy(ivrs->header.oem_id, ACPI_OEM_ID);
>> +    fixed_strcpy(ivrs->header.oem_table_id, ACPI_OEM_TABLE_ID);
>> +
>> +    ivrs->header.oem_revision = ACPI_OEM_REVISION;
>> +    ivrs->header.creator_id   = ACPI_CREATOR_ID;
>> +    ivrs->header.creator_revision = ACPI_CREATOR_REVISION;
>> +
>> +    ivrs->header.length = sizeof(*ivrs);
>> +
>> +    /* initialize IVHD Block */
>> +    ivhd =&ivrs->ivhd_block;
>> +    ivrs->iv_info = (IVRS_VASIZE<<  15) |
(IVRS_PASIZE<<  8) |
>> +                    (IVRS_GVASIZE<<  5);
>> +
>> +    ivhd->type          = IVHD_BLOCK_TYPE;
>> +    ivhd->flags         = IVHD_FLAG_PPRSUP | IVHD_FLAG_IOTLBSUP;
>> +    ivhd->devid         = iommu_bdf;
>> +    ivhd->cap_offset    = IOMMU_CAP_OFFSET;
>> +
>> +    /*reserve 32K IOMMU MMIO space */
>> +    mmio = virt_to_phys(mem_alloc(0x8000, 0x1000));
>> +    if (!mmio)
>> +    {
>> +        printf("unable to reserve iommu mmio pages: out of
memory\n");
>> +        return NULL;
>> +    }
>> +
>> +    p.domid = DOMID_SELF;
>> +    p.index = HVM_PARAM_IOMMU_BASE;
>> +    p.value = mmio;
>> +
>> +    /* Return non-zero if IOMMUv2 hardware is not avaliable */
>> +    if ( hypercall_hvm_op(HVMOP_set_param,&p) )
>> +    {
>> +        printf("unable to set iommu mmio base address\n");
>> +        return NULL;
>> +    }
>> +
>> +    ivhd->iommu_base_addr = mmio;
>> +    ivhd->reserved = IVHD_EFR_IASUP | IVHD_EFR_GTSUP;
>> +
>> +    /* Build IVHD device entries */
>> +    dev_entry = ivrs->ivhd_device;
>> +    for ( int i = 0; i<  ptdev_nr; i++ )
>> +    {
>> +        dev_entry[i].type   = IVHD_SELECT_4_BYTE;
>> +        dev_entry[i].dev_id = ptdev_bdf[i];
>> +        dev_entry[i].flags  = 0;
>> +    }
>> +
>> +    ivhd->length = sizeof(*ivhd) + sizeof(*dev_entry) *
PT_DEV_MAX_NR;
>> +    set_checksum(ivrs, offsetof(struct acpi_header, checksum),
>> +                 ivrs->header.length);
>> +
>> +    return ivrs;
>> +}
>> +
>>   static int construct_secondary_tables(unsigned long *table_ptrs,
>>                                         struct acpi_info *info)  { @@
-206,6 +289,7 @@ static int
>> construct_secondary_tables(un
>>       struct acpi_20_hpet *hpet;
>>       struct acpi_20_waet *waet;
>>       struct acpi_20_tcpa *tcpa;
>> +    struct acpi_40_ivrs *ivrs;
>>       unsigned char *ssdt;
>>       static const uint16_t tis_signature[] = {0x0001, 0x0001, 0x0001};
>>       uint16_t *tis_hdr;
>> @@ -293,6 +377,13 @@ static int construct_secondary_tables(un
>>           }
>>       }
>>
>> +    if ( !strncmp(xenstore_read("guest_iommu",
"1"), "1", 1) )
>> +    {
>> +        ivrs = construct_ivrs();
>> +        if ( ivrs != NULL )
>> +            table_ptrs[nr_tables++] = (unsigned long)ivrs;
>> +    }
>> +
>>       table_ptrs[nr_tables] = 0;
>>       return nr_tables;
>>   }
>> diff -r e9d74ec10774 -r d0611a8ee06d tools/firmware/hvmloader/pci.c
>> --- a/tools/firmware/hvmloader/pci.c	Thu Mar 08 13:36:54 2012 +0100
>> +++ b/tools/firmware/hvmloader/pci.c	Thu Mar 08 13:36:57 2012 +0100
>> @@ -34,11 +34,17 @@ unsigned long pci_mem_end = PCI_MEM_END;
>> enum virtual_vga virtual_vga = VGA_none;  unsigned long
>> igd_opregion_pgbase = 0;
>>
>> +/* support up to 32 passthrough devices */
>> +#define PT_DEV_MAX_NR           32
>> +uint32_t ptdev_bdf[PT_DEV_MAX_NR];
>> +uint32_t ptdev_nr;
>> +uint32_t iommu_bdf = 0;
>> +
>>   void pci_setup(void)
>>   {
>>       uint32_t base, devfn, bar_reg, bar_data, bar_sz, cmd, mmio_total
= 0;
>>       uint32_t vga_devfn = 256;
>> -    uint16_t class, vendor_id, device_id;
>> +    uint16_t class, vendor_id, device_id, sub_vendor_id;
>>       unsigned int bar, pin, link, isa_irq;
>>
>>       /* Resources assignable to PCI devices via BARs. */ @@ -72,12
+78,34 @@
>> void pci_setup(void)
>>           class     = pci_readw(devfn, PCI_CLASS_DEVICE);
>>           vendor_id = pci_readw(devfn, PCI_VENDOR_ID);
>>           device_id = pci_readw(devfn, PCI_DEVICE_ID);
>> +        sub_vendor_id = pci_readw(devfn, PCI_SUBSYSTEM_VENDOR_ID);
>> +
>>           if ( (vendor_id == 0xffff)&&  (device_id == 0xffff) )
>>               continue;
>>
>>           ASSERT((devfn != PCI_ISA_DEVFN) ||
>>                  ((vendor_id == 0x8086)&&  (device_id ==
0x7000)));
>>
>> +        /* Found amd iommu device. */
>> +        if ( class == 0x0806&&  vendor_id == 0x1022 )
>> +        {
>> +            iommu_bdf = devfn;
>> +            continue;
>> +        }
>> +        /* IVRS: Detecting passthrough devices.
>> +         * sub_vendor_id != citrix&&  sub_vendor_id != qemu */
>> +        if ( sub_vendor_id != 0x5853&&  sub_vendor_id !=
0x1af4 )
>> +        {
>> +            /* found a passthru device */
>> +            if ( ptdev_nr<  PT_DEV_MAX_NR )
>> +            {
>> +                ptdev_bdf[ptdev_nr] = devfn;
>> +                ptdev_nr++;
>> +            }
>> +            else
>> +                printf("Number of passthru devices> 
PT_DEV_MAX_NR \n");
>> +        }
>> +
>>           switch ( class )
>>           {
>>           case 0x0300:
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>

Zhang, Xiantao

2012-Mar-09 00:55 UTC

head link

Re: [PATCH 3 of 6 V6] hvmloader: Build IVRS table

> -----Original Message-----
> From: Wei Wang [mailto:wei.wang2@amd.com]
> Sent: Thursday, March 08, 2012 10:50 PM
> To: Zhang, Xiantao
> Cc: Ian.Jackson@eu.citrix.com; Ian.Campbell@citrix.com; JBeulich@suse.com;
> keir@xen.org; xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] [PATCH 3 of 6 V6] hvmloader: Build IVRS table
> 
> On 03/08/2012 03:22 PM, Zhang, Xiantao wrote:
> > I think this IVRS table should be vendor-specific, and we should have
the
> mechanism make it only work for AMD IOMMU.  This is because Intel also
> has the similar support in next generation VT-d,  DMAR table should be
built
> also when enable virtual VT-d for the guest.   I suggest this table should
be
> only built when guest running on AMD''s platforms.
> > Thanks!
> > Xiantao
> 
> Hi Xianto
> Thanks for reviewing it. Actually construct_ivrs() will invoke a hypercall
> guest_iommu_set_base() and it will fail on non-iommuv2 systems including
> vtd. So IVRS can be avoided on Intel systems. But I am also thinking that
> maybe we should let user chose different iommu hw at the beginning. For
> example, use guest_iommu={vtd, amd, 0} to distinguish different iommu
> hardware?Hi, Wei
        The term iommu should be neutral for vendors, and the option
guest_iommu={0,1} is also fine to VT-d.  This guest''s iommu depends on
host''s capability, so we can''t let user choose the options
like virtual iommu _v2 on vt-d or virtual vt-d on iommu_v2.  Basically,  in
hvm_loader, we should detect platform''s vendor, and then build the
corresponding ACPI tables.   For the hypercall, we should review the parameters
and let it work for both sides, and it may have different implementations for
AMD''s and Intel''s platforms in hypervisor.
Thanks!
Xiantao


 > >> -----Original Message-----
> >> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> >> bounces@lists.xen.org] On Behalf Of Wei Wang
> >> Sent: Thursday, March 08, 2012 9:22 PM
> >> To: Ian.Jackson@eu.citrix.com; Ian.Campbell@citrix.com;
> >> JBeulich@suse.com; keir@xen.org
> >> Cc: xen-devel@lists.xensource.com
> >> Subject: [Xen-devel] [PATCH 3 of 6 V6] hvmloader: Build IVRS table
> >>
> >> # HG changeset patch
> >> # User Wei Wang<wei.wang2@amd.com>
> >> # Date 1331210217 -3600
> >> # Node ID d0611a8ee06d3f34de1c7c51da8571d9e1a668e1
> >> # Parent  e9d74ec1077472f9127c43903811ce3107fc038d
> >> hvmloader: Build IVRS table.
> >>
> >> There are 32 ivrs padding entries allocated at the beginning. If a
> >> passthru device has been found from qemu bus, a padding entry will
be
> >> replaced by a real device entry. This patch has been tested with
both
> >> rombios and seabios
> >>
> >> Signed-off-by: Wei Wang<wei.wang2@amd.com>
> >>
> >> diff -r e9d74ec10774 -r d0611a8ee06d
> >> tools/firmware/hvmloader/acpi/acpi2_0.h
> >> --- a/tools/firmware/hvmloader/acpi/acpi2_0.h	Thu Mar 08 13:36:54
> >> 2012 +0100
> >> +++ b/tools/firmware/hvmloader/acpi/acpi2_0.h	Thu Mar 08 13:36:57
> >> 2012 +0100
> >> @@ -389,6 +389,60 @@ struct acpi_20_madt_intsrcovr {  #define
> >> ACPI_2_0_WAET_REVISION 0x01  #define ACPI_1_0_FADT_REVISION
> 0x01
> >>
> >> +#define IVRS_SIGNATURE
ASCII32(''I'',''V'',''R'',''S'')
> >> +#define IVRS_REVISION           1
> >> +#define IVRS_VASIZE             64
> >> +#define IVRS_PASIZE             52
> >> +#define IVRS_GVASIZE            64
> >> +
> >> +#define IVHD_BLOCK_TYPE         0x10
> >> +#define IVHD_FLAG_HTTUNEN       (1<<  0)
> >> +#define IVHD_FLAG_PASSPW        (1<<  1)
> >> +#define IVHD_FLAG_RESPASSPW     (1<<  2)
> >> +#define IVHD_FLAG_ISOC          (1<<  3)
> >> +#define IVHD_FLAG_IOTLBSUP      (1<<  4)
> >> +#define IVHD_FLAG_COHERENT      (1<<  5)
> >> +#define IVHD_FLAG_PREFSUP       (1<<  6)
> >> +#define IVHD_FLAG_PPRSUP        (1<<  7)
> >> +
> >> +#define IVHD_EFR_GTSUP          (1<<  2)
> >> +#define IVHD_EFR_IASUP          (1<<  5)
> >> +
> >> +#define IVHD_SELECT_4_BYTE      0x2
> >> +
> >> +struct ivrs_ivhd_block
> >> +{
> >> +    uint8_t    type;
> >> +    uint8_t    flags;
> >> +    uint16_t   length;
> >> +    uint16_t   devid;
> >> +    uint16_t   cap_offset;
> >> +    uint64_t   iommu_base_addr;
> >> +    uint16_t   pci_segment;
> >> +    uint16_t   iommu_info;
> >> +    uint32_t   reserved;
> >> +};
> >> +
> >> +/* IVHD 4-byte device entries */
> >> +struct ivrs_ivhd_device
> >> +{
> >> +   uint8_t  type;
> >> +   uint16_t dev_id;
> >> +   uint8_t  flags;
> >> +};
> >> +
> >> +#define PT_DEV_MAX_NR           32
> >> +#define IOMMU_CAP_OFFSET        0x40
> >> +struct acpi_40_ivrs
> >> +{
> >> +    struct acpi_header                      header;
> >> +    uint32_t                                iv_info;
> >> +    uint32_t                                reserved[2];
> >> +    struct ivrs_ivhd_block                  ivhd_block;
> >> +    struct ivrs_ivhd_device                
ivhd_device[PT_DEV_MAX_NR];
> >> +};
> >> +
> >> +
> >>   #pragma pack ()
> >>
> >>   struct acpi_config {
> >> diff -r e9d74ec10774 -r d0611a8ee06d
> tools/firmware/hvmloader/acpi/build.c
> >> --- a/tools/firmware/hvmloader/acpi/build.c	Thu Mar 08 13:36:54
> 2012
> >> +0100
> >> +++ b/tools/firmware/hvmloader/acpi/build.c	Thu Mar 08 13:36:57
> 2012
> >> +0100
> >> @@ -23,6 +23,8 @@
> >>   #include "ssdt_pm.h"
> >>   #include "../config.h"
> >>   #include "../util.h"
> >> +#include "../hypercall.h"
> >> +#include<xen/hvm/params.h>
> >>
> >>   #define align16(sz)        (((sz) + 15)&  ~15)
> >>   #define fixed_strcpy(d, s) strncpy((d), (s), sizeof(d)) @@
-198,6
> >> +200,87 @@ static struct acpi_20_waet *construct_wa
> >>       return waet;
> >>   }
> >>
> >> +extern uint32_t ptdev_bdf[PT_DEV_MAX_NR]; extern uint32_t
> ptdev_nr;
> >> +extern uint32_t iommu_bdf; static struct acpi_40_ivrs*
> >> +construct_ivrs(void) {
> >> +    struct acpi_40_ivrs *ivrs;
> >> +    uint64_t mmio;
> >> +    struct ivrs_ivhd_block *ivhd;
> >> +    struct ivrs_ivhd_device *dev_entry;
> >> +    struct xen_hvm_param p;
> >> +
> >> +    if (ptdev_nr == 0 || iommu_bdf == 0) return NULL;
> >> +
> >> +    ivrs = mem_alloc(sizeof(*ivrs), 16);
> >> +    if (!ivrs)
> >> +    {
> >> +        printf("unable to build IVRS tables: out of
memory\n");
> >> +        return NULL;
> >> +    }
> >> +    memset(ivrs, 0, sizeof(*ivrs));
> >> +
> >> +    /* initialize acpi header */
> >> +    ivrs->header.signature = IVRS_SIGNATURE;
> >> +    ivrs->header.revision = IVRS_REVISION;
> >> +    fixed_strcpy(ivrs->header.oem_id, ACPI_OEM_ID);
> >> +    fixed_strcpy(ivrs->header.oem_table_id,
ACPI_OEM_TABLE_ID);
> >> +
> >> +    ivrs->header.oem_revision = ACPI_OEM_REVISION;
> >> +    ivrs->header.creator_id   = ACPI_CREATOR_ID;
> >> +    ivrs->header.creator_revision = ACPI_CREATOR_REVISION;
> >> +
> >> +    ivrs->header.length = sizeof(*ivrs);
> >> +
> >> +    /* initialize IVHD Block */
> >> +    ivhd =&ivrs->ivhd_block;
> >> +    ivrs->iv_info = (IVRS_VASIZE<<  15) |
(IVRS_PASIZE<<  8) |
> >> +                    (IVRS_GVASIZE<<  5);
> >> +
> >> +    ivhd->type          = IVHD_BLOCK_TYPE;
> >> +    ivhd->flags         = IVHD_FLAG_PPRSUP |
IVHD_FLAG_IOTLBSUP;
> >> +    ivhd->devid         = iommu_bdf;
> >> +    ivhd->cap_offset    = IOMMU_CAP_OFFSET;
> >> +
> >> +    /*reserve 32K IOMMU MMIO space */
> >> +    mmio = virt_to_phys(mem_alloc(0x8000, 0x1000));
> >> +    if (!mmio)
> >> +    {
> >> +        printf("unable to reserve iommu mmio pages: out of
memory\n");
> >> +        return NULL;
> >> +    }
> >> +
> >> +    p.domid = DOMID_SELF;
> >> +    p.index = HVM_PARAM_IOMMU_BASE;
> >> +    p.value = mmio;
> >> +
> >> +    /* Return non-zero if IOMMUv2 hardware is not avaliable */
> >> +    if ( hypercall_hvm_op(HVMOP_set_param,&p) )
> >> +    {
> >> +        printf("unable to set iommu mmio base
address\n");
> >> +        return NULL;
> >> +    }
> >> +
> >> +    ivhd->iommu_base_addr = mmio;
> >> +    ivhd->reserved = IVHD_EFR_IASUP | IVHD_EFR_GTSUP;
> >> +
> >> +    /* Build IVHD device entries */
> >> +    dev_entry = ivrs->ivhd_device;
> >> +    for ( int i = 0; i<  ptdev_nr; i++ )
> >> +    {
> >> +        dev_entry[i].type   = IVHD_SELECT_4_BYTE;
> >> +        dev_entry[i].dev_id = ptdev_bdf[i];
> >> +        dev_entry[i].flags  = 0;
> >> +    }
> >> +
> >> +    ivhd->length = sizeof(*ivhd) + sizeof(*dev_entry) *
PT_DEV_MAX_NR;
> >> +    set_checksum(ivrs, offsetof(struct acpi_header, checksum),
> >> +                 ivrs->header.length);
> >> +
> >> +    return ivrs;
> >> +}
> >> +
> >>   static int construct_secondary_tables(unsigned long *table_ptrs,
> >>                                         struct acpi_info *info)  {
@@
> >> -206,6 +289,7 @@ static int construct_secondary_tables(un
> >>       struct acpi_20_hpet *hpet;
> >>       struct acpi_20_waet *waet;
> >>       struct acpi_20_tcpa *tcpa;
> >> +    struct acpi_40_ivrs *ivrs;
> >>       unsigned char *ssdt;
> >>       static const uint16_t tis_signature[] = {0x0001, 0x0001,
0x0001};
> >>       uint16_t *tis_hdr;
> >> @@ -293,6 +377,13 @@ static int construct_secondary_tables(un
> >>           }
> >>       }
> >>
> >> +    if ( !strncmp(xenstore_read("guest_iommu",
"1"), "1", 1) )
> >> +    {
> >> +        ivrs = construct_ivrs();
> >> +        if ( ivrs != NULL )
> >> +            table_ptrs[nr_tables++] = (unsigned long)ivrs;
> >> +    }
> >> +
> >>       table_ptrs[nr_tables] = 0;
> >>       return nr_tables;
> >>   }
> >> diff -r e9d74ec10774 -r d0611a8ee06d
tools/firmware/hvmloader/pci.c
> >> --- a/tools/firmware/hvmloader/pci.c	Thu Mar 08 13:36:54 2012
> +0100
> >> +++ b/tools/firmware/hvmloader/pci.c	Thu Mar 08 13:36:57 2012
> +0100
> >> @@ -34,11 +34,17 @@ unsigned long pci_mem_end = PCI_MEM_END;
> enum
> >> virtual_vga virtual_vga = VGA_none;  unsigned long
> >> igd_opregion_pgbase = 0;
> >>
> >> +/* support up to 32 passthrough devices */
> >> +#define PT_DEV_MAX_NR           32
> >> +uint32_t ptdev_bdf[PT_DEV_MAX_NR];
> >> +uint32_t ptdev_nr;
> >> +uint32_t iommu_bdf = 0;
> >> +
> >>   void pci_setup(void)
> >>   {
> >>       uint32_t base, devfn, bar_reg, bar_data, bar_sz, cmd,
mmio_total = 0;
> >>       uint32_t vga_devfn = 256;
> >> -    uint16_t class, vendor_id, device_id;
> >> +    uint16_t class, vendor_id, device_id, sub_vendor_id;
> >>       unsigned int bar, pin, link, isa_irq;
> >>
> >>       /* Resources assignable to PCI devices via BARs. */ @@
-72,12
> >> +78,34 @@ void pci_setup(void)
> >>           class     = pci_readw(devfn, PCI_CLASS_DEVICE);
> >>           vendor_id = pci_readw(devfn, PCI_VENDOR_ID);
> >>           device_id = pci_readw(devfn, PCI_DEVICE_ID);
> >> +        sub_vendor_id = pci_readw(devfn,
PCI_SUBSYSTEM_VENDOR_ID);
> >> +
> >>           if ( (vendor_id == 0xffff)&&  (device_id ==
0xffff) )
> >>               continue;
> >>
> >>           ASSERT((devfn != PCI_ISA_DEVFN) ||
> >>                  ((vendor_id == 0x8086)&&  (device_id ==
0x7000)));
> >>
> >> +        /* Found amd iommu device. */
> >> +        if ( class == 0x0806&&  vendor_id == 0x1022 )
> >> +        {
> >> +            iommu_bdf = devfn;
> >> +            continue;
> >> +        }
> >> +        /* IVRS: Detecting passthrough devices.
> >> +         * sub_vendor_id != citrix&&  sub_vendor_id !=
qemu */
> >> +        if ( sub_vendor_id != 0x5853&&  sub_vendor_id !=
0x1af4 )
> >> +        {
> >> +            /* found a passthru device */
> >> +            if ( ptdev_nr<  PT_DEV_MAX_NR )
> >> +            {
> >> +                ptdev_bdf[ptdev_nr] = devfn;
> >> +                ptdev_nr++;
> >> +            }
> >> +            else
> >> +                printf("Number of passthru devices> 
PT_DEV_MAX_NR \n");
> >> +        }
> >> +
> >>           switch ( class )
> >>           {
> >>           case 0x0300:
> >>
> >>
> >> _______________________________________________
> >> Xen-devel mailing list
> >> Xen-devel@lists.xen.org
> >> http://lists.xen.org/xen-devel
> >
>

Ian Jackson

2012-Mar-12 11:25 UTC

head link

Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc

Wei Wang writes ("Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for
libxc"):> The former one will be used by qemu. But Ian said we would not add new 
> features into qemu traditional,,So I am waiting for pci passthru patches 
> for upstream qemu being accepted and then I coudl send out my qemu 
> patch. I also had a prove-of-concept patch that was based on the old 
> qemu. I could rebase and resend it if necessary.
> 
> http://lists.xen.org/archives/html/xen-devel/2011-12/msg01269.html
How confident are we that the new hypercall will be needed for
whatever shape of passthough support ends up in upstream qemu ?

Ian.

Wei Wang

2012-Mar-12 15:43 UTC

head link

Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc

On 03/12/2012 12:25 PM, Ian Jackson wrote:> Wei Wang writes ("Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls
for libxc"):
>> The former one will be used by qemu. But Ian said we would not add new
>> features into qemu traditional,,So I am waiting for pci passthru
patches
>> for upstream qemu being accepted and then I coudl send out my qemu
>> patch. I also had a prove-of-concept patch that was based on the old
>> qemu. I could rebase and resend it if necessary.
>>
>> http://lists.xen.org/archives/html/xen-devel/2011-12/msg01269.html
>
> How confident are we that the new hypercall will be needed for
> whatever shape of passthough support ends up in upstream qemu ?
>
> Ian.
>
Hi Ian,
Well, the hypercall itself is very simple and straightforward. It just 
allow qemu to notify Xen guest iommu msi configurations. Except this, 
there is also a need in qemu of registering a virtual iommu device in 
qemu bus. Since we might not wait upstream qemu for 4.2 release, I would 
like to suggest that if we could take the qemu patch back to traditional 
qemu. This will give Xen 4.2 the complete support for iommuv2 no matter 
how upstream qemu looks like, and will also help Linux distros that will 
be based on Xen 4.2 to have full iommuv2 support. How do you think about it?

Thanks,
Wei

Ian Jackson

2012-Mar-14 11:55 UTC

head link

Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc

Wei Wang writes ("Re: [Xen-devel] [PATCH 1 of 6 V6] amd iommu: Add 2
hypercalls for libxc"):> Well, the hypercall itself is very simple and straightforward. It just 
> allow qemu to notify Xen guest iommu msi configurations. Except this, 
> there is also a need in qemu of registering a virtual iommu device in 
> qemu bus. Since we might not wait upstream qemu for 4.2 release, I would 
> like to suggest that if we could take the qemu patch back to traditional 
> qemu. This will give Xen 4.2 the complete support for iommuv2 no matter 
> how upstream qemu looks like, and will also help Linux distros that will 
> be based on Xen 4.2 to have full iommuv2 support. How do you think about
it?
We really are trying to keep qemu-xen-unstable feature frozen.  It is
a shame that the necessary features are still delayed getting into
qemu upstream, which has so far prevented us from switching the
default over.

Ian.
(qemu-devel added to the CC)

Ian Jackson

2012-Mar-14 14:02 UTC

head link

Re: [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems

Wei Wang writes ("[Xen-devel] [PATCH 0 of 6 V6] amd iommu: support
ats/gpgpu passthru on iommuv2 systems"):> This is patch set v6. It includes all pending patches that are needed to
enable gpgpu passthrough and heterogeneous computing for guests.
I was looking at this again and it occurred to me to ask: what is
"heterogeneous computing for guests" ?

Ian.

Ian Campbell

2012-Mar-14 14:09 UTC

head link

Re: [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems

On Thu, 2012-03-08 at 13:21 +0000, Wei Wang wrote:> Hi,
> This is patch set v6. It includes all pending patches that are needed to
enable gpgpu passthrough
I''m curious, is any of this actually specific to GPGPU passthrough or
is
it just that GPGPU passthrough happens to be the main (only?) reason you
would want to expose a virtualised IOMMU to the guest OS? Are there any
other use cases?

Ian.

Wei Wang

2012-Mar-14 14:32 UTC

head link

Re: [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems

On 03/14/2012 03:02 PM, Ian Jackson wrote:> Wei Wang writes ("[Xen-devel] [PATCH 0 of 6 V6] amd iommu: support
ats/gpgpu passthru on iommuv2 systems"):
>> This is patch set v6. It includes all pending patches that are needed
to enable gpgpu passthrough and heterogeneous computing for guests.
>
> I was looking at this again and it occurred to me to ask: what is
> "heterogeneous computing for guests" ?
>
> Ian.
>
It means to passthrough gpgpu to guest and to run openCL applications in 
guest OS. Using gpu for GP computing is possible for native OS now, this 
patch just implements the same thing for guest OS. It might be a nice 
feature for 4.2, isn''t it?

Thanks,
Wei

Wei Wang

2012-Mar-14 14:48 UTC

head link

Re: [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems

On 03/14/2012 03:09 PM, Ian Campbell wrote:> On Thu, 2012-03-08 at 13:21 +0000, Wei Wang wrote:
>> Hi,
>> This is patch set v6. It includes all pending patches that are needed
to enable gpgpu passthrough
>
> I''m curious, is any of this actually specific to GPGPU passthrough
or is
> it just that GPGPU passthrough happens to be the main (only?) reason you
> would want to expose a virtualised IOMMU to the guest OS? Are there any
> other use cases?
>
> Ian.
>
>
>
For now, gpgpu is the only use case. But actually any sophisticated ats 
devices that supports PASID and PRI capabilities will need to work with 
iommuv2 driver in guest OS, so they will also need a virtual iommu in 
guest space. Please check ats spec 1.1, those capabilities have been 
documented and they are not specific to GPGPU.

Thanks,
Wei

Xen devel - Mar 2012 - [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems

[PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems

[PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc

[PATCH 2 of 6 V6] amd iommu: call guest_iommu_set_base from hvmloader

[PATCH 3 of 6 V6] hvmloader: Build IVRS table

[PATCH 4 of 6 V6] libxc: add wrappers for new hypercalls

[PATCH 5 of 6 V6] libxl: bind virtual bdf to physical bdf after device assignment

[PATCH 6 of 6 V6] libxl: Introduce a new guest config file parameter

Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc

Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc

Re: [PATCH 3 of 6 V6] hvmloader: Build IVRS table

Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc

Re: [PATCH 3 of 6 V6] hvmloader: Build IVRS table

Re: [PATCH 3 of 6 V6] hvmloader: Build IVRS table

Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc

Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc

Re: [PATCH 1 of 6 V6] amd iommu: Add 2 hypercalls for libxc

Re: [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems

Re: [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems

Re: [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems

Re: [PATCH 0 of 6 V6] amd iommu: support ats/gpgpu passthru on iommuv2 systems