Jean-Philippe Brucker
2019-Nov-22 10:49 UTC
[RFC 00/13] virtio-iommu on non-devicetree platforms
I'm seeking feedback on multi-platform support for virtio-iommu. At the moment only devicetree (DT) is supported and we don't have a pleasant solution for other platforms. Once we figure out the topology description, x86 support is trivial. Since the IOMMU manages memory accesses from other devices, the guest kernel needs to initialize the IOMMU before endpoints start issuing DMA. It's a solved problem: firmware or hypervisor describes through DT or ACPI tables the device dependencies, and probe of endpoints is deferred until the IOMMU is probed. But: (1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD and IORT for Arm). From my point of view IORT is easier to extend, since we just need to introduce a new node type. There are no dependencies to Arm in the Linux IORT driver, so it works well with CONFIG_X86. However, there are concerns about other OS vendors feeling obligated to implement this new node, so Arm proposed introducing another ACPI table, that can wrap any of DMAR, IVRS and IORT to extend it with new virtual nodes. A draft of this VIOT table specification is available at http://jpbrucker.net/virtio-iommu/viot/viot-v5.pdf I'm afraid this could increase fragmentation as guests would need to implement or modify their support for all of DMAR, IVRS and IORT. If we end up doing VIOT, I suggest limiting it to IORT. (2) In addition, there are some concerns about having virtio depend on ACPI or DT. Some hypervisors (Firecracker, QEMU microvm, kvmtool x86 [1]) don't currently implement those methods. It was suggested to embed the topology description into the device. It can work, as demonstrated at the end of this RFC, with the following limitations: - The topology description must be read before any endpoint managed by the IOMMU is probed, and even before the virtio module is loaded. This RFC uses a PCI quirk to manually parse the virtio configuration. It assumes that all endpoints managed by the IOMMU are under this same PCI host. - I don't have a solution for the virtio-mmio transport at the moment, because I haven't had time to modify a host to test it. I think it could either use a notifier on the platform bus, or better, a new 'iommu' command-line argument to the virtio-mmio driver. So the current prototype doesn't work for firecracker and microvm, which rely on virtio-mmio. - For Arm, if the platform has an ITS, the hypervisor needs IORT or DT to describe it anyway. More generally, not using either ACPI or DT might prevent from supporting other features as well. I suspect the above users will have to implement a standard method sooner or later. - Even when reusing as much existing code as possible, guest support is still going to be around a few hundred lines since we can't rely on the normal virtio infrastructure to be loaded at that point. As you can see below, the diffstat for the incomplete topology implementation is already bigger than the exhaustive IORT support, even when jumping through the VIOT hoop. So it's a lightweight solution for very specific use-cases, and we should still support ACPI for the general case. Multi-platform guests such as Linux will then need to support three topology descriptions instead of two. In this RFC I present both solutions, but I'd rather not keep all of it. Please see the individual patches for details: (1) Patches 1, 3-10 add support for virtio-iommu to the Linux IORT driver and patches 2, 11 add the VIOT glue. (2) Patch 12 adds the built-in topology description to the virtio-iommu specification. Patch 13 is a partial implementation for the Linux virtio-iommu driver. It only supports PCI, not platform devices. You can find Linux and QEMU code on my virtio-iommu/devel branches at http://jpbrucker.net/git/linux and http://jpbrucker.net/git/qemu I split the diffstat since there are two independent features. The first one is for patches 1-11, and the second one for patch 13. Jean-Philippe Brucker (11): ACPI/IORT: Move IORT to the ACPI folder ACPI: Add VIOT definitions ACPI/IORT: Allow registration of external tables ACPI/IORT: Add node categories ACPI/IORT: Support VIOT virtio-mmio node ACPI/IORT: Support VIOT virtio-pci node ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode ACPI/IORT: Add callback to update a device's fwnode iommu/virtio: Create fwnode if necessary iommu/virtio: Update IORT fwnode ACPI: Add VIOT table MAINTAINERS | 9 + drivers/acpi/Kconfig | 7 + drivers/acpi/Makefile | 2 + drivers/acpi/arm64/Kconfig | 3 - drivers/acpi/arm64/Makefile | 1 - drivers/acpi/bus.c | 2 + drivers/acpi/{arm64 => }/iort.c | 317 ++++++++++++++++++++++++++------ drivers/acpi/tables.c | 2 +- drivers/acpi/viot.c | 44 +++++ drivers/iommu/Kconfig | 1 + drivers/iommu/virtio-iommu.c | 61 +++++- include/acpi/actbl2.h | 31 ++++ include/linux/acpi_iort.h | 14 ++ include/linux/acpi_viot.h | 20 ++ 14 files changed, 448 insertions(+), 66 deletions(-) rename drivers/acpi/{arm64 => }/iort.c (86%) create mode 100644 drivers/acpi/viot.c create mode 100644 include/linux/acpi_viot.h Jean-Philippe Brucker (1): iommu/virtio: Add topology description to virtio-iommu config space drivers/base/platform.c | 3 + drivers/iommu/Kconfig | 9 + drivers/iommu/Makefile | 1 + drivers/iommu/virtio-iommu-topology.c | 410 ++++++++++++++++++++++++++ drivers/iommu/virtio-iommu.c | 3 + drivers/pci/pci-driver.c | 3 + include/linux/virtio_iommu.h | 18 ++ include/uapi/linux/virtio_iommu.h | 26 ++ 8 files changed, 473 insertions(+) create mode 100644 drivers/iommu/virtio-iommu-topology.c create mode 100644 include/linux/virtio_iommu.h [1] firecracker: https://github.com/firecracker-microvm/firecracker microvm: https://github.com/qemu/qemu/blob/master/docs/microvm.rst kvmtool: https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/ -- 2.24.0
Jean-Philippe Brucker
2019-Nov-22 10:49 UTC
[RFC 01/13] ACPI/IORT: Move IORT to the ACPI folder
IORT can be used (by QEMU) to describe a virtual topology containing an architecture-agnostic paravirtualized device. In order to build IORT for x86 systems, the driver has to be moved outside of arm64/. Since there is nothing specific to arm64 in the driver, it simply requires moving Makefile and Kconfig entries. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker at arm.com> --- MAINTAINERS | 9 +++++++++ drivers/acpi/Kconfig | 3 +++ drivers/acpi/Makefile | 1 + drivers/acpi/arm64/Kconfig | 3 --- drivers/acpi/arm64/Makefile | 1 - drivers/acpi/{arm64 => }/iort.c | 0 6 files changed, 13 insertions(+), 4 deletions(-) rename drivers/acpi/{arm64 => }/iort.c (100%) diff --git a/MAINTAINERS b/MAINTAINERS index eb19fad370d7..9153d278f67e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -377,6 +377,15 @@ L: platform-driver-x86 at vger.kernel.org S: Maintained F: drivers/platform/x86/i2c-multi-instantiate.c +ACPI IORT DRIVER +M: Lorenzo Pieralisi <lorenzo.pieralisi at arm.com> +M: Hanjun Guo <guohanjun at huawei.com> +M: Sudeep Holla <sudeep.holla at arm.com> +L: linux-acpi at vger.kernel.org +L: linux-arm-kernel at lists.infradead.org (moderated for non-subscribers) +S: Maintained +F: drivers/acpi/iort.c + ACPI PMIC DRIVERS M: "Rafael J. Wysocki" <rjw at rjwysocki.net> M: Len Brown <lenb at kernel.org> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index ebe1e9e5fd81..548976c8b2b0 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -576,6 +576,9 @@ config TPS68470_PMIC_OPREGION region, which must be available before any of the devices using this, are probed. +config ACPI_IORT + bool + endif # ACPI config X86_PM_TIMER diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 5d361e4e3405..9d1792165713 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -123,3 +123,4 @@ video-objs += acpi_video.o video_detect.o obj-y += dptf/ obj-$(CONFIG_ARM64) += arm64/ +obj-$(CONFIG_ACPI_IORT) += iort.o diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig index 6dba187f4f2e..d0902c85d46e 100644 --- a/drivers/acpi/arm64/Kconfig +++ b/drivers/acpi/arm64/Kconfig @@ -3,8 +3,5 @@ # ACPI Configuration for ARM64 # -config ACPI_IORT - bool - config ACPI_GTDT bool diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile index 6ff50f4ed947..38771a816caf 100644 --- a/drivers/acpi/arm64/Makefile +++ b/drivers/acpi/arm64/Makefile @@ -1,3 +1,2 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_ACPI_IORT) += iort.o obj-$(CONFIG_ACPI_GTDT) += gtdt.o diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/iort.c similarity index 100% rename from drivers/acpi/arm64/iort.c rename to drivers/acpi/iort.c -- 2.24.0
This is temporary, until the VIOT table is published and these definitions added to ACPICA. Signed-off-by: Jean-Philippe Brucker <jean-philippe at linaro.org> --- include/acpi/actbl2.h | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/include/acpi/actbl2.h b/include/acpi/actbl2.h index e45ced27f4c3..99c1d747e9d8 100644 --- a/include/acpi/actbl2.h +++ b/include/acpi/actbl2.h @@ -25,6 +25,7 @@ * the wrong signature. */ #define ACPI_SIG_IORT "IORT" /* IO Remapping Table */ +#define ACPI_SIG_VIOT "VIOT" /* Virtual I/O Table */ #define ACPI_SIG_IVRS "IVRS" /* I/O Virtualization Reporting Structure */ #define ACPI_SIG_LPIT "LPIT" /* Low Power Idle Table */ #define ACPI_SIG_MADT "APIC" /* Multiple APIC Description Table */ @@ -412,6 +413,36 @@ struct acpi_ivrs_memory { u64 memory_length; }; +/******************************************************************************* + * + * VIOT - Virtual I/O Table + * Version 1 + * + ******************************************************************************/ + +struct acpi_table_viot { + struct acpi_table_header header; + u8 reserved[12]; + struct acpi_table_header base_table; +}; + +#define ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU 0x80 +#define ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU 0x81 + +struct acpi_viot_iort_virtio_pci_iommu { + u32 devid; +}; + +struct acpi_viot_iort_virtio_mmio_iommu { + u64 base_address; + u64 span; + u64 flags; + u64 interrupt; +}; + +/* FIXME: rename this monstrosity. */ +#define ACPI_VIOT_IORT_VIRTIO_MMIO_IOMMU_CACHE_COHERENT (1<<0) + /******************************************************************************* * * LPIT - Low Power Idle Table -- 2.24.0
Jean-Philippe Brucker
2019-Nov-22 10:49 UTC
[RFC 03/13] ACPI/IORT: Allow registration of external tables
Add a function to register an IORT table from an external source. Signed-off-by: Jean-Philippe Brucker <jean-philippe at linaro.org> --- drivers/acpi/iort.c | 22 ++++++++++++++++++++-- include/linux/acpi_iort.h | 10 ++++++++++ 2 files changed, 30 insertions(+), 2 deletions(-) diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c index d62a9ea26fae..9c6c91e06f8f 100644 --- a/drivers/acpi/iort.c +++ b/drivers/acpi/iort.c @@ -144,6 +144,7 @@ typedef acpi_status (*iort_find_node_callback) /* Root pointer to the mapped IORT table */ static struct acpi_table_header *iort_table; +static enum iort_table_source iort_table_source; static LIST_HEAD(iort_msi_chip_list); static DEFINE_SPINLOCK(iort_msi_chip_lock); @@ -1617,11 +1618,28 @@ static void __init iort_init_platform_devices(void) } } +void __init acpi_iort_register_table(struct acpi_table_header *table, + enum iort_table_source source) +{ + /* + * Firmware or hypervisor should know better than give us two IORT + * tables. + */ + if (WARN_ON(iort_table)) + return; + + iort_table = table; + iort_table_source = source; + + iort_init_platform_devices(); +} + void __init acpi_iort_init(void) { acpi_status status; + static struct acpi_table_header *table; - status = acpi_get_table(ACPI_SIG_IORT, 0, &iort_table); + status = acpi_get_table(ACPI_SIG_IORT, 0, &table); if (ACPI_FAILURE(status)) { if (status != AE_NOT_FOUND) { const char *msg = acpi_format_exception(status); @@ -1632,5 +1650,5 @@ void __init acpi_iort_init(void) return; } - iort_init_platform_devices(); + acpi_iort_register_table(table, IORT_SOURCE_IORT); } diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h index 8e7e2ec37f1b..f4db5fff07cf 100644 --- a/include/linux/acpi_iort.h +++ b/include/linux/acpi_iort.h @@ -11,6 +11,11 @@ #include <linux/fwnode.h> #include <linux/irqdomain.h> +enum iort_table_source { + IORT_SOURCE_IORT, /* The Real Thing */ + IORT_SOURCE_VIOT, /* Paravirtual extensions */ +}; + #define IORT_IRQ_MASK(irq) (irq & 0xffffffffULL) #define IORT_IRQ_TRIGGER_MASK(irq) ((irq >> 32) & 0xffffffffULL) @@ -27,6 +32,8 @@ int iort_register_domain_token(int trans_id, phys_addr_t base, void iort_deregister_domain_token(int trans_id); struct fwnode_handle *iort_find_domain_token(int trans_id); #ifdef CONFIG_ACPI_IORT +void acpi_iort_register_table(struct acpi_table_header *table, + enum iort_table_source source); void acpi_iort_init(void); u32 iort_msi_map_rid(struct device *dev, u32 req_id); struct irq_domain *iort_get_device_domain(struct device *dev, u32 req_id); @@ -37,6 +44,9 @@ void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size); const struct iommu_ops *iort_iommu_configure(struct device *dev); int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head); #else +static void acpi_iort_register_table(struct acpi_table_header *table, + enum iort_table_source source) +{ } static inline void acpi_iort_init(void) { } static inline u32 iort_msi_map_rid(struct device *dev, u32 req_id) { return req_id; } -- 2.24.0
The current node filtering won't work when introducing node types greater than 63 (such as the virtio-iommu nodes). Add node_type_matches() to filter nodes by category. Signed-off-by: Jean-Philippe Brucker <jean-philippe at linaro.org> --- drivers/acpi/iort.c | 34 ++++++++++++++++++++++++---------- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c index 9c6c91e06f8f..1d43fbc0001f 100644 --- a/drivers/acpi/iort.c +++ b/drivers/acpi/iort.c @@ -18,10 +18,10 @@ #include <linux/platform_device.h> #include <linux/slab.h> -#define IORT_TYPE_MASK(type) (1 << (type)) -#define IORT_MSI_TYPE (1 << ACPI_IORT_NODE_ITS_GROUP) -#define IORT_IOMMU_TYPE ((1 << ACPI_IORT_NODE_SMMU) | \ - (1 << ACPI_IORT_NODE_SMMU_V3)) +enum iort_node_category { + IORT_MSI_TYPE, + IORT_IOMMU_TYPE, +}; struct iort_its_msi_chip { struct list_head list; @@ -38,6 +38,20 @@ struct iort_fwnode { static LIST_HEAD(iort_fwnode_list); static DEFINE_SPINLOCK(iort_fwnode_lock); +static bool iort_type_matches(u8 type, enum iort_node_category category) +{ + switch (category) { + case IORT_IOMMU_TYPE: + return type == ACPI_IORT_NODE_SMMU || + type == ACPI_IORT_NODE_SMMU_V3; + case IORT_MSI_TYPE: + return type == ACPI_IORT_NODE_ITS_GROUP; + default: + WARN_ON(1); + return false; + } +} + /** * iort_set_fwnode() - Create iort_fwnode and use it to register * iommu data in the iort_fwnode_list @@ -397,7 +411,7 @@ static int iort_get_id_mapping_index(struct acpi_iort_node *node) static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, u32 id_in, u32 *id_out, - u8 type_mask) + enum iort_node_category category) { u32 id = id_in; @@ -406,7 +420,7 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, struct acpi_iort_id_mapping *map; int i, index; - if (IORT_TYPE_MASK(node->type) & type_mask) { + if (iort_type_matches(node->type, category)) { if (id_out) *id_out = id; return node; @@ -458,8 +472,8 @@ static struct acpi_iort_node *iort_node_map_id(struct acpi_iort_node *node, } static struct acpi_iort_node *iort_node_map_platform_id( - struct acpi_iort_node *node, u32 *id_out, u8 type_mask, - int index) + struct acpi_iort_node *node, u32 *id_out, + enum iort_node_category category, int index) { struct acpi_iort_node *parent; u32 id; @@ -475,8 +489,8 @@ static struct acpi_iort_node *iort_node_map_platform_id( * as NC (named component) -> SMMU -> ITS. If the type is matched, * return the initial dev id and its parent pointer directly. */ - if (!(IORT_TYPE_MASK(parent->type) & type_mask)) - parent = iort_node_map_id(parent, id, id_out, type_mask); + if (!iort_type_matches(parent->type, category)) + parent = iort_node_map_id(parent, id, id_out, category); else if (id_out) *id_out = id; -- 2.24.0
Jean-Philippe Brucker
2019-Nov-22 10:49 UTC
[RFC 05/13] ACPI/IORT: Support VIOT virtio-mmio node
Add a new type of node to the IORT driver, that describes a virtio-iommu device based on the virtio-mmio transport. The node is only available when the IORT is a sub-table of the VIOT. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker at arm.com> --- drivers/acpi/iort.c | 66 ++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 62 insertions(+), 4 deletions(-) diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c index 1d43fbc0001f..adc5953fffa5 100644 --- a/drivers/acpi/iort.c +++ b/drivers/acpi/iort.c @@ -43,7 +43,8 @@ static bool iort_type_matches(u8 type, enum iort_node_category category) switch (category) { case IORT_IOMMU_TYPE: return type == ACPI_IORT_NODE_SMMU || - type == ACPI_IORT_NODE_SMMU_V3; + type == ACPI_IORT_NODE_SMMU_V3 || + type == ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU; case IORT_MSI_TYPE: return type == ACPI_IORT_NODE_ITS_GROUP; default: @@ -868,8 +869,10 @@ static inline bool iort_iommu_driver_enabled(u8 type) return IS_BUILTIN(CONFIG_ARM_SMMU_V3); case ACPI_IORT_NODE_SMMU: return IS_BUILTIN(CONFIG_ARM_SMMU); + case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU: + return IS_ENABLED(CONFIG_VIRTIO_IOMMU); default: - pr_warn("IORT node type %u does not describe an SMMU\n", type); + pr_warn("IORT node type %u does not describe an IOMMU\n", type); return false; } } @@ -1408,6 +1411,46 @@ static int __init arm_smmu_v3_pmcg_add_platdata(struct platform_device *pdev) return platform_device_add_data(pdev, &model, sizeof(model)); } +static int __init viommu_mmio_count_resources(struct acpi_iort_node *node) +{ + /* Mem + IRQ */ + return 2; +} + +static void __init viommu_mmio_init_resources(struct resource *res, + struct acpi_iort_node *node) +{ + int hw_irq, trigger; + struct acpi_viot_iort_virtio_mmio_iommu *viommu; + + viommu = (struct acpi_viot_iort_virtio_mmio_iommu *)node->node_data; + + res[0].start = viommu->base_address; + res[0].end = viommu->base_address + viommu->span - 1; + res[0].flags = IORESOURCE_MEM; + + hw_irq = IORT_IRQ_MASK(viommu->interrupt); + trigger = IORT_IRQ_TRIGGER_MASK(viommu->interrupt); + acpi_iort_register_irq(hw_irq, "viommu", trigger, res + 1); +} + +static void __init viommu_mmio_dma_configure(struct device *dev, + struct acpi_iort_node *node) +{ + enum dev_dma_attr attr; + struct acpi_viot_iort_virtio_mmio_iommu *viommu; + + viommu = (struct acpi_viot_iort_virtio_mmio_iommu *)node->node_data; + + attr = (viommu->flags & ACPI_VIOT_IORT_VIRTIO_MMIO_IOMMU_CACHE_COHERENT) ? + DEV_DMA_COHERENT : DEV_DMA_NON_COHERENT; + + dev->dma_mask = &dev->coherent_dma_mask; + + /* Configure DMA for the page table walker */ + acpi_dma_configure(dev, attr); +} + struct iort_dev_config { const char *name; int (*dev_init)(struct acpi_iort_node *node); @@ -1443,6 +1486,14 @@ static const struct iort_dev_config iort_arm_smmu_v3_pmcg_cfg __initconst = { .dev_add_platdata = arm_smmu_v3_pmcg_add_platdata, }; +static const struct iort_dev_config iort_viommu_mmio_cfg __initconst = { + /* Probe with the generic virtio-mmio driver */ + .name = "virtio-mmio", + .dev_dma_configure = viommu_mmio_dma_configure, + .dev_count_resources = viommu_mmio_count_resources, + .dev_init_resources = viommu_mmio_init_resources, +}; + static __init const struct iort_dev_config *iort_get_dev_cfg( struct acpi_iort_node *node) { @@ -1453,9 +1504,16 @@ static __init const struct iort_dev_config *iort_get_dev_cfg( return &iort_arm_smmu_cfg; case ACPI_IORT_NODE_PMCG: return &iort_arm_smmu_v3_pmcg_cfg; - default: - return NULL; } + + if (iort_table_source == IORT_SOURCE_VIOT) { + switch (node->type) { + case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU: + return &iort_viommu_mmio_cfg; + } + } + + return NULL; } /** -- 2.24.0
Jean-Philippe Brucker
2019-Nov-22 10:49 UTC
[RFC 06/13] ACPI/IORT: Support VIOT virtio-pci node
When virtio-iommu uses the PCI transport, IORT doesn't instantiate the device and doesn't create a fwnode. They will be created later by the PCI subsystem. Store the information needed to identify the IOMMU in iort_fwnode_list. Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker at arm.com> --- drivers/acpi/iort.c | 117 +++++++++++++++++++++++++++++++++++--------- 1 file changed, 93 insertions(+), 24 deletions(-) diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c index adc5953fffa5..b517aa4e83ba 100644 --- a/drivers/acpi/iort.c +++ b/drivers/acpi/iort.c @@ -30,10 +30,17 @@ struct iort_its_msi_chip { u32 translation_id; }; +struct iort_pci_devid { + u16 segment; + u8 bus; + u8 devfn; +}; + struct iort_fwnode { struct list_head list; struct acpi_iort_node *iort_node; struct fwnode_handle *fwnode; + struct iort_pci_devid *pci_devid; }; static LIST_HEAD(iort_fwnode_list); static DEFINE_SPINLOCK(iort_fwnode_lock); @@ -44,7 +51,8 @@ static bool iort_type_matches(u8 type, enum iort_node_category category) case IORT_IOMMU_TYPE: return type == ACPI_IORT_NODE_SMMU || type == ACPI_IORT_NODE_SMMU_V3 || - type == ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU; + type == ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU || + type == ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU; case IORT_MSI_TYPE: return type == ACPI_IORT_NODE_ITS_GROUP; default: @@ -59,12 +67,14 @@ static bool iort_type_matches(u8 type, enum iort_node_category category) * * @node: IORT table node associated with the IOMMU * @fwnode: fwnode associated with the IORT node + * @pci_devid: pci device ID associated with the IORT node, may be NULL * * Returns: 0 on success * <0 on failure */ static inline int iort_set_fwnode(struct acpi_iort_node *iort_node, - struct fwnode_handle *fwnode) + struct fwnode_handle *fwnode, + struct iort_pci_devid *pci_devid) { struct iort_fwnode *np; @@ -76,6 +86,7 @@ static inline int iort_set_fwnode(struct acpi_iort_node *iort_node, INIT_LIST_HEAD(&np->list); np->iort_node = iort_node; np->fwnode = fwnode; + np->pci_devid = pci_devid; spin_lock(&iort_fwnode_lock); list_add_tail(&np->list, &iort_fwnode_list); @@ -121,6 +132,7 @@ static inline void iort_delete_fwnode(struct acpi_iort_node *node) spin_lock(&iort_fwnode_lock); list_for_each_entry_safe(curr, tmp, &iort_fwnode_list, list) { if (curr->iort_node == node) { + kfree(curr->pci_devid); list_del(&curr->list); kfree(curr); break; @@ -870,6 +882,7 @@ static inline bool iort_iommu_driver_enabled(u8 type) case ACPI_IORT_NODE_SMMU: return IS_BUILTIN(CONFIG_ARM_SMMU); case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU: + case ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU: return IS_ENABLED(CONFIG_VIRTIO_IOMMU); default: pr_warn("IORT node type %u does not describe an IOMMU\n", type); @@ -1451,6 +1464,28 @@ static void __init viommu_mmio_dma_configure(struct device *dev, acpi_dma_configure(dev, attr); } +static __init struct iort_pci_devid * +viommu_pci_get_devid(struct acpi_iort_node *node) +{ + unsigned int val; + struct iort_pci_devid *devid; + struct acpi_viot_iort_virtio_pci_iommu *viommu; + + viommu = (struct acpi_viot_iort_virtio_pci_iommu *)node->node_data; + + val = le32_to_cpu(viommu->devid); + + devid = kzalloc(sizeof(*devid), GFP_KERNEL); + if (!devid) + return ERR_PTR(-ENOMEM); + + devid->segment = val >> 16; + devid->bus = PCI_BUS_NUM(val); + devid->devfn = val & 0xff; + + return devid; +} + struct iort_dev_config { const char *name; int (*dev_init)(struct acpi_iort_node *node); @@ -1462,6 +1497,7 @@ struct iort_dev_config { int (*dev_set_proximity)(struct device *dev, struct acpi_iort_node *node); int (*dev_add_platdata)(struct platform_device *pdev); + struct iort_pci_devid *(*dev_get_pci_devid)(struct acpi_iort_node *node); }; static const struct iort_dev_config iort_arm_smmu_v3_cfg __initconst = { @@ -1494,6 +1530,10 @@ static const struct iort_dev_config iort_viommu_mmio_cfg __initconst = { .dev_init_resources = viommu_mmio_init_resources, }; +static const struct iort_dev_config iort_viommu_pci_cfg __initconst = { + .dev_get_pci_devid = viommu_pci_get_devid, +}; + static __init const struct iort_dev_config *iort_get_dev_cfg( struct acpi_iort_node *node) { @@ -1510,6 +1550,8 @@ static __init const struct iort_dev_config *iort_get_dev_cfg( switch (node->type) { case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU: return &iort_viommu_mmio_cfg; + case ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU: + return &iort_viommu_pci_cfg; } } @@ -1641,13 +1683,55 @@ static void __init iort_enable_acs(struct acpi_iort_node *iort_node) static inline void iort_enable_acs(struct acpi_iort_node *iort_node) { } #endif -static void __init iort_init_platform_devices(void) +static int __init iort_init_node(struct acpi_iort_node *iort_node) +{ + int ret; + const struct iort_dev_config *ops; + struct fwnode_handle *fwnode; + + iort_enable_acs(iort_node); + + ops = iort_get_dev_cfg(iort_node); + if (!ops) + return 0; + + if (ops->dev_get_pci_devid) { + struct iort_pci_devid *pci_devid + ops->dev_get_pci_devid(iort_node); + + if (IS_ERR(pci_devid)) + return PTR_ERR(pci_devid); + /* + * For a PCI-based IOMMU, set the pci_devid handle now, but + * leave the fwnode empty. It will be completed later when the + * PCI device gets probed. + */ + iort_set_fwnode(iort_node, NULL, pci_devid); + + return 0; + } + + fwnode = acpi_alloc_fwnode_static(); + if (!fwnode) + return -ENOMEM; + + iort_set_fwnode(iort_node, fwnode, NULL); + + ret = iort_add_platform_device(iort_node, ops); + if (ret) { + iort_delete_fwnode(iort_node); + acpi_free_fwnode_static(fwnode); + return ret; + } + + return 0; +} + +static void __init iort_init_devices(void) { struct acpi_iort_node *iort_node, *iort_end; struct acpi_table_iort *iort; - struct fwnode_handle *fwnode; - int i, ret; - const struct iort_dev_config *ops; + int i; /* * iort_table and iort both point to the start of IORT table, but @@ -1667,23 +1751,8 @@ static void __init iort_init_platform_devices(void) return; } - iort_enable_acs(iort_node); - - ops = iort_get_dev_cfg(iort_node); - if (ops) { - fwnode = acpi_alloc_fwnode_static(); - if (!fwnode) - return; - - iort_set_fwnode(iort_node, fwnode); - - ret = iort_add_platform_device(iort_node, ops); - if (ret) { - iort_delete_fwnode(iort_node); - acpi_free_fwnode_static(fwnode); - return; - } - } + if (iort_init_node(iort_node)) + return; iort_node = ACPI_ADD_PTR(struct acpi_iort_node, iort_node, iort_node->length); @@ -1703,7 +1772,7 @@ void __init acpi_iort_register_table(struct acpi_table_header *table, iort_table = table; iort_table_source = source; - iort_init_platform_devices(); + iort_init_devices(); } void __init acpi_iort_init(void) -- 2.24.0
Jean-Philippe Brucker
2019-Nov-22 10:49 UTC
[RFC 07/13] ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode
When the IOMMU is PCI-based, IORT doesn't know the fwnode until the driver has had a chance to register it. In addition to deferring the probe until the IOMMU ops are set, also defer the probe until the fwspec is available. Signed-off-by: Jean-Philippe Brucker <jean-philippe at linaro.org> --- drivers/acpi/iort.c | 54 ++++++++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 23 deletions(-) diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c index b517aa4e83ba..f08f72d8af78 100644 --- a/drivers/acpi/iort.c +++ b/drivers/acpi/iort.c @@ -61,6 +61,22 @@ static bool iort_type_matches(u8 type, enum iort_node_category category) } } +static inline bool iort_iommu_driver_enabled(u8 type) +{ + switch (type) { + case ACPI_IORT_NODE_SMMU_V3: + return IS_BUILTIN(CONFIG_ARM_SMMU_V3); + case ACPI_IORT_NODE_SMMU: + return IS_BUILTIN(CONFIG_ARM_SMMU); + case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU: + case ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU: + return IS_ENABLED(CONFIG_VIRTIO_IOMMU); + default: + pr_warn("IORT node type %u does not describe an IOMMU\n", type); + return false; + } +} + /** * iort_set_fwnode() - Create iort_fwnode and use it to register * iommu data in the iort_fwnode_list @@ -102,9 +118,9 @@ static inline int iort_set_fwnode(struct acpi_iort_node *iort_node, * * Returns: fwnode_handle pointer on success, NULL on failure */ -static inline struct fwnode_handle *iort_get_fwnode( - struct acpi_iort_node *node) +static inline struct fwnode_handle *iort_get_fwnode(struct acpi_iort_node *node) { + int err = -ENODEV; struct iort_fwnode *curr; struct fwnode_handle *fwnode = NULL; @@ -112,12 +128,20 @@ static inline struct fwnode_handle *iort_get_fwnode( list_for_each_entry(curr, &iort_fwnode_list, list) { if (curr->iort_node == node) { fwnode = curr->fwnode; + if (!fwnode && curr->pci_devid) { + /* + * Postpone probe until virtio-iommu has + * registered its fwnode. + */ + err = iort_iommu_driver_enabled(node->type) ? + -EPROBE_DEFER : -ENODEV; + } break; } } spin_unlock(&iort_fwnode_lock); - return fwnode; + return fwnode ?: ERR_PTR(err); } /** @@ -874,22 +898,6 @@ int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) return (resv == its->its_count) ? resv : -ENODEV; } -static inline bool iort_iommu_driver_enabled(u8 type) -{ - switch (type) { - case ACPI_IORT_NODE_SMMU_V3: - return IS_BUILTIN(CONFIG_ARM_SMMU_V3); - case ACPI_IORT_NODE_SMMU: - return IS_BUILTIN(CONFIG_ARM_SMMU); - case ACPI_VIOT_IORT_NODE_VIRTIO_MMIO_IOMMU: - case ACPI_VIOT_IORT_NODE_VIRTIO_PCI_IOMMU: - return IS_ENABLED(CONFIG_VIRTIO_IOMMU); - default: - pr_warn("IORT node type %u does not describe an IOMMU\n", type); - return false; - } -} - static int arm_smmu_iort_xlate(struct device *dev, u32 streamid, struct fwnode_handle *fwnode, const struct iommu_ops *ops) @@ -920,8 +928,8 @@ static int iort_iommu_xlate(struct device *dev, struct acpi_iort_node *node, return -ENODEV; iort_fwnode = iort_get_fwnode(node); - if (!iort_fwnode) - return -ENODEV; + if (IS_ERR(iort_fwnode)) + return PTR_ERR(iort_fwnode); /* * If the ops look-up fails, this means that either @@ -1618,8 +1626,8 @@ static int __init iort_add_platform_device(struct acpi_iort_node *node, fwnode = iort_get_fwnode(node); - if (!fwnode) { - ret = -ENODEV; + if (IS_ERR(fwnode)) { + ret = PTR_ERR(fwnode); goto dev_put; } -- 2.24.0
Jean-Philippe Brucker
2019-Nov-22 10:49 UTC
[RFC 08/13] ACPI/IORT: Add callback to update a device's fwnode
For a PCI-based IOMMU, IORT isn't in charge of allocating a fwnode. Let the IOMMU driver update the fwnode associated to an IORT node when available. Signed-off-by: Jean-Philippe Brucker <jean-philippe at linaro.org> --- drivers/acpi/iort.c | 38 ++++++++++++++++++++++++++++++++++++++ include/linux/acpi_iort.h | 4 ++++ 2 files changed, 42 insertions(+) diff --git a/drivers/acpi/iort.c b/drivers/acpi/iort.c index f08f72d8af78..8263ab275b2b 100644 --- a/drivers/acpi/iort.c +++ b/drivers/acpi/iort.c @@ -1038,11 +1038,49 @@ const struct iommu_ops *iort_iommu_configure(struct device *dev) return ops; } + +/** + * iort_iommu_update_fwnode - update fwnode of a PCI IOMMU + * @dev: the IOMMU device + * @fwnode: the fwnode, or NULL to remove an existing fwnode + * + * A PCI device isn't instantiated by the IORT driver. The IOMMU driver sets or + * removes its fwnode using this function. + */ +void iort_iommu_update_fwnode(struct device *dev, struct fwnode_handle *fwnode) +{ + struct pci_dev *pdev; + struct iort_fwnode *curr; + struct iort_pci_devid *devid; + + if (!dev_is_pci(dev)) + return; + + pdev = to_pci_dev(dev); + + spin_lock(&iort_fwnode_lock); + list_for_each_entry(curr, &iort_fwnode_list, list) { + devid = curr->pci_devid; + if (devid && + pci_domain_nr(pdev->bus) == devid->segment && + pdev->bus->number == devid->bus && + pdev->devfn == devid->devfn) { + WARN_ON(fwnode && curr->fwnode); + curr->fwnode = fwnode; + break; + } + } + spin_unlock(&iort_fwnode_lock); +} +EXPORT_SYMBOL_GPL(iort_iommu_update_fwnode); #else int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) { return 0; } const struct iommu_ops *iort_iommu_configure(struct device *dev) { return NULL; } +static void iort_iommu_update_fwnode(struct device *dev, + struct fwnode_handle *fwnode) +{ } #endif static int nc_dma_get_range(struct device *dev, u64 *size) diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h index f4db5fff07cf..840635e40d9d 100644 --- a/include/linux/acpi_iort.h +++ b/include/linux/acpi_iort.h @@ -43,6 +43,7 @@ int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id); void iort_dma_setup(struct device *dev, u64 *dma_addr, u64 *size); const struct iommu_ops *iort_iommu_configure(struct device *dev); int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head); +void iort_iommu_update_fwnode(struct device *dev, struct fwnode_handle *fwnode); #else static void acpi_iort_register_table(struct acpi_table_header *table, enum iort_table_source source) @@ -63,6 +64,9 @@ static inline const struct iommu_ops *iort_iommu_configure( static inline int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) { return 0; } +static void iort_iommu_update_fwnode(struct device *dev, + struct fwnode_handle *fwnode) +{ } #endif #endif /* __ACPI_IORT_H__ */ -- 2.24.0
Jean-Philippe Brucker
2019-Nov-22 10:49 UTC
[RFC 09/13] iommu/virtio: Create fwnode if necessary
The presence of a fwnode on a PCI device depends on the platform. QEMU q35, for example, creates an ACPI description for each PCI slot, but QEMU virt (aarch64) doesn't. Since the IOMMU subsystem relies heavily on fwnode to discover the DMA topology, create a fwnode for the virtio-iommu if necessary, using the software_node framework. Signed-off-by: Jean-Philippe Brucker <jean-philippe at linaro.org> --- drivers/iommu/virtio-iommu.c | 56 ++++++++++++++++++++++++++++++++---- 1 file changed, 51 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 3ea9d7682999..8efa368134c0 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -966,6 +966,48 @@ static struct iommu_ops viommu_ops = { .of_xlate = viommu_of_xlate, }; +static int viommu_set_fwnode(struct viommu_dev *viommu) +{ + /* + * viommu->dev is the virtio device, its parent is the associated + * transport device. + */ + struct device *dev = viommu->dev->parent; + + /* + * With device tree a fwnode is always present. With ACPI, on some + * platforms a PCI device has a DSDT node describing the slot. On other + * platforms, no fwnode is created and we have to do it ourselves. + */ + if (!dev->fwnode) { + struct fwnode_handle *fwnode; + + fwnode = fwnode_create_software_node(NULL, NULL); + if (IS_ERR(fwnode)) + return PTR_ERR(fwnode); + + set_primary_fwnode(dev, fwnode); + } + + iommu_device_set_fwnode(&viommu->iommu, dev->fwnode); + return 0; +} + +static void viommu_clear_fwnode(struct viommu_dev *viommu) +{ + struct device *dev = viommu->dev->parent; + + if (!dev->fwnode) + return; + + if (is_software_node(dev->fwnode)) { + struct fwnode_handle *fwnode = dev->fwnode; + + set_primary_fwnode(dev, NULL); + fwnode_remove_software_node(fwnode); + } +} + static int viommu_init_vqs(struct viommu_dev *viommu) { struct virtio_device *vdev = dev_to_virtio(viommu->dev); @@ -1004,7 +1046,6 @@ static int viommu_fill_evtq(struct viommu_dev *viommu) static int viommu_probe(struct virtio_device *vdev) { - struct device *parent_dev = vdev->dev.parent; struct viommu_dev *viommu = NULL; struct device *dev = &vdev->dev; u64 input_start = 0; @@ -1084,9 +1125,11 @@ static int viommu_probe(struct virtio_device *vdev) if (ret) goto err_free_vqs; - iommu_device_set_ops(&viommu->iommu, &viommu_ops); - iommu_device_set_fwnode(&viommu->iommu, parent_dev->fwnode); + ret = viommu_set_fwnode(viommu); + if (ret) + goto err_sysfs_remove; + iommu_device_set_ops(&viommu->iommu, &viommu_ops); iommu_device_register(&viommu->iommu); #ifdef CONFIG_PCI @@ -1119,8 +1162,10 @@ static int viommu_probe(struct virtio_device *vdev) return 0; err_unregister: - iommu_device_sysfs_remove(&viommu->iommu); iommu_device_unregister(&viommu->iommu); + viommu_clear_fwnode(viommu); +err_sysfs_remove: + iommu_device_sysfs_remove(&viommu->iommu); err_free_vqs: vdev->config->del_vqs(vdev); @@ -1131,8 +1176,9 @@ static void viommu_remove(struct virtio_device *vdev) { struct viommu_dev *viommu = vdev->priv; - iommu_device_sysfs_remove(&viommu->iommu); iommu_device_unregister(&viommu->iommu); + viommu_clear_fwnode(viommu); + iommu_device_sysfs_remove(&viommu->iommu); /* Stop all virtqueues */ vdev->config->reset(vdev); -- 2.24.0
When the virtio-iommu uses the PCI transport and the topology is described with IORT, register the PCI fwnode with IORT. Signed-off-by: Jean-Philippe Brucker <jean-philippe at linaro.org> --- drivers/iommu/virtio-iommu.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 8efa368134c0..9847552faecc 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -7,6 +7,7 @@ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#include <linux/acpi_iort.h> #include <linux/amba/bus.h> #include <linux/delay.h> #include <linux/dma-iommu.h> @@ -989,6 +990,8 @@ static int viommu_set_fwnode(struct viommu_dev *viommu) set_primary_fwnode(dev, fwnode); } + /* Tell IORT about a PCI device's fwnode */ + iort_iommu_update_fwnode(dev, dev->fwnode); iommu_device_set_fwnode(&viommu->iommu, dev->fwnode); return 0; } @@ -1000,6 +1003,8 @@ static void viommu_clear_fwnode(struct viommu_dev *viommu) if (!dev->fwnode) return; + iort_iommu_update_fwnode(dev, NULL); + if (is_software_node(dev->fwnode)) { struct fwnode_handle *fwnode = dev->fwnode; -- 2.24.0
Add support for a new ACPI table that embeds other tables describing a platform's IOMMU topology. Currently the only supported base table is IORT. The VIOT contains an IORT with additional node types, that describe a virtio-iommu. Signed-off-by: Jean-Philippe Brucker <jean-philippe at linaro.org> --- drivers/acpi/Kconfig | 4 ++++ drivers/acpi/Makefile | 1 + drivers/acpi/bus.c | 2 ++ drivers/acpi/tables.c | 2 +- drivers/acpi/viot.c | 44 +++++++++++++++++++++++++++++++++++++++ drivers/iommu/Kconfig | 1 + include/linux/acpi_viot.h | 20 ++++++++++++++++++ 7 files changed, 73 insertions(+), 1 deletion(-) create mode 100644 drivers/acpi/viot.c create mode 100644 include/linux/acpi_viot.h diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 548976c8b2b0..513a5e4d3526 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -579,6 +579,10 @@ config TPS68470_PMIC_OPREGION config ACPI_IORT bool +config ACPI_VIOT + bool + select ACPI_IORT + endif # ACPI config X86_PM_TIMER diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 9d1792165713..6abdc6cc32c7 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -124,3 +124,4 @@ obj-y += dptf/ obj-$(CONFIG_ARM64) += arm64/ obj-$(CONFIG_ACPI_IORT) += iort.o +obj-$(CONFIG_ACPI_VIOT) += viot.o diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c index 48bc96d45bab..6f364e0c9240 100644 --- a/drivers/acpi/bus.c +++ b/drivers/acpi/bus.c @@ -25,6 +25,7 @@ #include <linux/dmi.h> #endif #include <linux/acpi_iort.h> +#include <linux/acpi_viot.h> #include <linux/pci.h> #include <acpi/apei.h> #include <linux/suspend.h> @@ -1246,6 +1247,7 @@ static int __init acpi_init(void) pci_mmcfg_late_init(); acpi_iort_init(); + acpi_viot_init(); acpi_scan_init(); acpi_ec_init(); acpi_debugfs_init(); diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c index 180ac4329763..9662ea5e1064 100644 --- a/drivers/acpi/tables.c +++ b/drivers/acpi/tables.c @@ -501,7 +501,7 @@ static const char * const table_sigs[] = { ACPI_SIG_WDDT, ACPI_SIG_WDRT, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_PSDT, ACPI_SIG_RSDT, ACPI_SIG_XSDT, ACPI_SIG_SSDT, ACPI_SIG_IORT, ACPI_SIG_NFIT, ACPI_SIG_HMAT, ACPI_SIG_PPTT, - NULL }; + ACPI_SIG_VIOT, NULL }; #define ACPI_HEADER_SIZE sizeof(struct acpi_table_header) diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c new file mode 100644 index 000000000000..ab9a6e43ad9b --- /dev/null +++ b/drivers/acpi/viot.c @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2019 Linaro + * + * Virtual IOMMU table + */ +#define pr_fmt(fmt) "ACPI: VIOT: " fmt + +#include <linux/acpi.h> +#include <linux/acpi_iort.h> +#include <linux/acpi_viot.h> + +int __init acpi_viot_init(void) +{ + struct acpi_table_viot *viot; + struct acpi_table_header *acpi_header; + acpi_status status; + + status = acpi_get_table(ACPI_SIG_VIOT, 0, &acpi_header); + if (ACPI_FAILURE(status)) { + if (status != AE_NOT_FOUND) { + const char *msg = acpi_format_exception(status); + + pr_err("Failed to get table, %s\n", msg); + return -EINVAL; + } + + return 0; + } + + if (acpi_header->length < sizeof(*viot)) { + pr_err("VIOT table overflow, bad table!\n"); + return -EINVAL; + } + + viot = (struct acpi_table_viot *)acpi_header; + if (ACPI_COMPARE_NAMESEG(viot->base_table.signature, ACPI_SIG_IORT)) { + acpi_iort_register_table(&viot->base_table, IORT_SOURCE_VIOT); + return 0; + } + + pr_err("Unknown base table header\n"); + return -EINVAL; +} diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index e3842eabcfdd..e6eb4f238d1a 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -480,6 +480,7 @@ config VIRTIO_IOMMU depends on ARM64 select IOMMU_API select INTERVAL_TREE + select ACPI_VIOT if ACPI help Para-virtualised IOMMU driver with virtio. diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h new file mode 100644 index 000000000000..6c282d5eb793 --- /dev/null +++ b/include/linux/acpi_viot.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2019 Linaro + */ + +#ifndef __ACPI_VIOT_H__ +#define __ACPI_VIOT_H__ + +#ifdef CONFIG_ACPI_VIOT + +int acpi_viot_init(void); + +#else /* !CONFIG_ACPI_VIOT */ + +static inline int acpi_viot_init(void) +{} + +#endif /* !CONFIG_ACPI_VIOT */ + +#endif /* __ACPI_VIOT_H__ */ -- 2.24.0
Jean-Philippe Brucker
2019-Nov-22 10:49 UTC
[RFC virtio 12/13] virtio-iommu: Add built-in topology description
Add a lightweight method to describe the IOMMU topology in the config space, guarded by a new feature bit. A list of capabilities in the config space describes the devices managed by the IOMMU and their endpoint IDs. Signed-off-by: Jean-Philippe Brucker <jean-philippe at linaro.org> --- virtio-iommu.tex | 88 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 88 insertions(+) diff --git a/virtio-iommu.tex b/virtio-iommu.tex index 28c562b..2b29873 100644 --- a/virtio-iommu.tex +++ b/virtio-iommu.tex @@ -67,6 +67,9 @@ \subsection{Feature bits}\label{sec:Device Types / IOMMU Device / Feature bits} \item[VIRTIO_IOMMU_F_MMIO (5)] The VIRTIO_IOMMU_MAP_F_MMIO flag is available. + +\item[VIRTIO_IOMMU_F_TOPOLOGY (6)] + Topology description is available at \field{topo_offset}. \end{description} \drivernormative{\subsubsection}{Feature bits}{Device Types / IOMMU Device / Feature bits} @@ -97,6 +100,7 @@ \subsection{Device configuration layout}\label{sec:Device Types / IOMMU Device / le32 end; } domain_range; le32 probe_size; + le16 topo_offset; }; \end{lstlisting} @@ -141,6 +145,90 @@ \subsection{Device initialization}\label{sec:Device Types / IOMMU Device / Devic If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the device SHOULD NOT let endpoints access the guest-physical address space. +\subsubsection{Built-in topology description}\label{sec:Device Types / IOMMU Device / Device initialization / topology} + +The device manages memory accesses from endpoints, identified by endpoint +IDs. The driver can discover which endpoint ID corresponds to an endpoint +using several methods, depending on the platform. Platforms described +with device tree use the \texttt{iommus} and \texttt{iommu-map} properties +embedded into device nodes for this purpose. Platforms described with +ACPI use a table such as the Virtual I/O Table. Platforms that do not +support either device tree or ACPI may embed a minimalistic description +in the device configuration space. + +An important disadvantage of describing the topology from within the +device is the lack of initialization ordering information. Out-of-band +descriptions such as device tree and ACPI let the operating system know +about device dependencies so that it can initialize supplier devices +(IOMMUs) before their consumers (endpoints). Platforms using the +VIRTIO_IOMMU_F_TOPOLOGY feature have to communicate the device dependency +in another way. + +If the VIRTIO_IOMMU_F_TOPOLOGY feature is negotiated, \field{topo_offset} +is the offset between the beginning of the device-specific configuration +space (virtio_iommu_config) and the first topology structure header. A +topology structures defines the endpoint ID of one or more endpoints +managed by the virtio-iommu device. + +\begin{lstlisting} +struct virtio_iommu_topo_head { + le16 type; + le16 next; +}; +\end{lstlisting} + +\field{next} is the offset between the beginning of the device-specific +configuration space and the next topology structure header. When +\field{next} is zero, this is the last structure. + +\field{type} describes the type of structure: +\begin{description} + \item[VIRTIO_IOMMU_TOPO_PCI_RANGE (0)] struct virtio_iommu_topo_pci_range + \item[VIRTIO_IOMMU_TOPO_ENDPOINT (1)] struct virtio_iommu_topo_endpoint +\end{description} + +\paragraph{PCI range}\label{sec:Device Types / IOMMU Device / Device initialization / topology / PCI range} + +\begin{lstlisting} +struct virtio_iommu_topo_pci_range { + struct virtio_iommu_topo_head head; + le32 endpoint_start; + le16 hierarchy; + le16 requester_start; + le16 requester_end; + le16 reserved; +}; +\end{lstlisting} + +The PCI range structure describes the endpoint IDs of a series of PCI +devices. + +\begin{description} + \item[\field{hierarchy}] Identifier of the PCI hierarchy. Sometimes + called PCI segment or domain number. + \item[\field{requester_start}] First requester ID in the range. + \item[\field{requester_end}] Last requester ID in the range. + \item[\field{endpoint_start}] First endpoint ID. +\end{description} + +The correspondence between a PCI requester ID in the range +[ requester_start; requester_end ] and its endpoint IDs is a linear +transformation: endpoint_id = requester_id - requester_start + +endpoint_start. + +\paragraph{Single endpoint}\label{sec:Device Types / IOMMU Device / Device initialization / topology / Single endpoint} + +\begin{lstlisting} +struct virtio_iommu_topo_endpoint { + struct virtio_iommu_topo_head head; + le32 endpoint; + le64 address; +}; +\end{lstlisting} + +\field{endpoint} is the ID of a single endpoint, identified by its first +MMIO address in the physical address space. + \subsection{Device operations}\label{sec:Device Types / IOMMU Device / Device operations} Driver send requests on the request virtqueue, notifies the device and -- 2.24.0
Jean-Philippe Brucker
2019-Nov-22 10:50 UTC
[RFC 13/13] iommu/virtio: Add topology description to
Some hypervisors don't implement either device-tree or ACPI, but still need a method to describe the IOMMU topology. Read the virtio-iommu config early and parse the topology description. Hook into the dma_setup() callbacks to initialize the IOMMU before probing endpoints. If the virtio-iommu uses the virtio-pci transport, this will only work if the PCI root complex is the first device probed. We don't currently support virtio-mmio. Initially I tried to generate a fake IORT table and feed it to the IORT driver, in order to avoid rewriting the whole DMA code, but it wouldn't work with platform endpoints, which are references to items in the ACPI table on IORT. Signed-off-by: Eric Auger <eric.auger at redhat.com> Signed-off-by: Jean-Philippe Brucker <jean-philippe at linaro.org> --- Note that we only call virt_dma_configure() if the host didn't provide either DT or ACPI method. If you want to test this with QEMU, you'll need to manually disable the acpi_dma_configure() part in pci-driver.c --- drivers/base/platform.c | 3 + drivers/iommu/Kconfig | 9 + drivers/iommu/Makefile | 1 + drivers/iommu/virtio-iommu-topology.c | 410 ++++++++++++++++++++++++++ drivers/iommu/virtio-iommu.c | 3 + drivers/pci/pci-driver.c | 3 + include/linux/virtio_iommu.h | 18 ++ include/uapi/linux/virtio_iommu.h | 26 ++ 8 files changed, 473 insertions(+) create mode 100644 drivers/iommu/virtio-iommu-topology.c create mode 100644 include/linux/virtio_iommu.h diff --git a/drivers/base/platform.c b/drivers/base/platform.c index b230beb6ccb4..70b12c8ef2fb 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -27,6 +27,7 @@ #include <linux/limits.h> #include <linux/property.h> #include <linux/kmemleak.h> +#include <linux/virtio_iommu.h> #include "base.h" #include "power/power.h" @@ -1257,6 +1258,8 @@ int platform_dma_configure(struct device *dev) } else if (has_acpi_companion(dev)) { attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode)); ret = acpi_dma_configure(dev, attr); + } else if (IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)) { + ret = virt_dma_configure(dev); } return ret; diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index e6eb4f238d1a..d02c0d36019d 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -486,4 +486,13 @@ config VIRTIO_IOMMU Say Y here if you intend to run this kernel as a guest. +config VIRTIO_IOMMU_TOPOLOGY + bool "Topology properties for the virtio-iommu" + depends on VIRTIO_IOMMU + help + Enable early probing of the virtio-iommu device, to detect the + topology description. + + Say Y here if you intend to run this kernel as a guest. + endif # IOMMU_SUPPORT diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 4f405f926e73..6b51c4186ebc 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -35,3 +35,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o +obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY) += virtio-iommu-topology.o diff --git a/drivers/iommu/virtio-iommu-topology.c b/drivers/iommu/virtio-iommu-topology.c new file mode 100644 index 000000000000..ec22510ace3d --- /dev/null +++ b/drivers/iommu/virtio-iommu-topology.c @@ -0,0 +1,410 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/acpi.h> +#include <linux/acpi_iort.h> +#include <linux/dma-iommu.h> +#include <linux/iommu.h> +#include <linux/list.h> +#include <linux/pci.h> +#include <linux/printk.h> +#include <linux/virtio_config.h> +#include <linux/virtio_iommu.h> +#include <linux/virtio_pci.h> +#include <uapi/linux/virtio_iommu.h> + +struct viommu_cap_config { + u8 pos; /* PCI capability position */ + u8 bar; + u32 length; /* structure size */ + u32 offset; /* structure offset within the bar */ +}; + +struct viommu_spec { + struct device *dev; /* transport device */ + struct fwnode_handle *fwnode; + struct iommu_ops *ops; + struct list_head topology; + struct list_head list; +}; + +struct viommu_topology { + union { + struct virtio_iommu_topo_head head; + struct virtio_iommu_topo_pci_range pci; + struct virtio_iommu_topo_endpoint ep; + }; + /* Index into viommu_spec->topology */ + struct list_head list; +}; + +static LIST_HEAD(viommus); +static DEFINE_MUTEX(viommus_lock); + +#define VPCI_FIELD(field) offsetof(struct virtio_pci_cap, field) + +static inline int viommu_find_capability(struct pci_dev *dev, u8 cfg_type, + struct viommu_cap_config *cap) +{ + int pos; + u8 bar; + + for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR); + pos > 0; + pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) { + u8 type; + + pci_read_config_byte(dev, pos + VPCI_FIELD(cfg_type), &type); + if (type != cfg_type) + continue; + + pci_read_config_byte(dev, pos + VPCI_FIELD(bar), &bar); + + /* Ignore structures with reserved BAR values */ + if (type != VIRTIO_PCI_CAP_PCI_CFG && bar > 0x5) + continue; + + cap->bar = bar; + cap->pos = pos; + pci_read_config_dword(dev, pos + VPCI_FIELD(length), + &cap->length); + pci_read_config_dword(dev, pos + VPCI_FIELD(offset), + &cap->offset); + + return pos; + } + return 0; +} + +/* + * Setup the special virtio PCI capability to read one of the config registers + */ +static int viommu_switch_pci_cfg(struct pci_dev *dev, int cfg, + struct viommu_cap_config *cap, u32 length, + u32 offset) +{ + offset += cap->offset; + + if (offset + length > cap->offset + cap->length) { + dev_warn(&dev->dev, + "read of %d bytes at offset 0x%x overflows cap of size %d\n", + length, offset, cap->length); + return -EOVERFLOW; + } + + pci_write_config_byte(dev, cfg + VPCI_FIELD(bar), cap->bar); + pci_write_config_dword(dev, cfg + VPCI_FIELD(length), length); + pci_write_config_dword(dev, cfg + VPCI_FIELD(offset), offset); + return 0; +} + +static u32 viommu_cread(struct pci_dev *dev, int cfg, + struct viommu_cap_config *cap, u32 length, u32 offset) +{ + u8 val8; + u16 val16; + u32 val32; + int out = cfg + sizeof(struct virtio_pci_cap); + + if (viommu_switch_pci_cfg(dev, cfg, cap, length, offset)) + return 0; + + switch (length) { + case 1: + pci_read_config_byte(dev, out, &val8); + return val8; + case 2: + pci_read_config_word(dev, out, &val16); + return val16; + case 4: + pci_read_config_dword(dev, out, &val32); + return val32; + default: + WARN_ON(1); + return 0; + } +} + +static void viommu_cwrite(struct pci_dev *dev, int cfg, + struct viommu_cap_config *cap, u32 length, u32 offset, + u32 val) +{ + int out = cfg + sizeof(struct virtio_pci_cap); + + if (viommu_switch_pci_cfg(dev, cfg, cap, length, offset)) + return; + + switch (length) { + case 1: + pci_write_config_byte(dev, out, (u8)val); + break; + case 2: + pci_write_config_word(dev, out, (u16)val); + break; + case 4: + pci_write_config_dword(dev, out, val); + break; + default: + WARN_ON(1); + } +} + +static int viommu_add_topology(struct viommu_spec *viommu_spec, + struct viommu_topology *cap) +{ + struct viommu_topology *new = kmemdup(cap, sizeof(*cap), GFP_KERNEL); + + if (!new) + return -ENOMEM; + + mutex_lock(&viommus_lock); + list_add(&new->list, &viommu_spec->topology); + mutex_unlock(&viommus_lock); + return 0; +} + +static int viommu_parse_topology(struct pci_dev *dev, int pci_cfg, + struct viommu_cap_config *dev_cfg) +{ + u32 offset; + struct viommu_topology cap; + struct viommu_spec *viommu_spec; + int iter = 0; /* Protects against config loop */ + + offset = viommu_cread(dev, pci_cfg, dev_cfg, 2, + offsetof(struct virtio_iommu_config, + topo_offset)); + if (!offset) + return 0; + + viommu_spec = kzalloc(sizeof(*viommu_spec), GFP_KERNEL); + if (!viommu_spec) + return -ENOMEM; + + INIT_LIST_HEAD(&viommu_spec->topology); + viommu_spec->dev = &dev->dev; + + while (offset >= sizeof(struct virtio_iommu_config) && ++iter < 0x10000) { + memset(&cap, 0, sizeof(cap)); + + cap.head.type = viommu_cread(dev, pci_cfg, dev_cfg, 2, offset); + cap.head.next = viommu_cread(dev, pci_cfg, dev_cfg, 2, offset + 2); + + switch (cap.head.type) { + case VIRTIO_IOMMU_TOPO_PCI_RANGE: + cap.pci.endpoint_start = viommu_cread(dev, pci_cfg, + dev_cfg, 2, offset + + 4); + cap.pci.hierarchy = viommu_cread(dev, pci_cfg, dev_cfg, + 2, offset + 8); + cap.pci.requester_start = viommu_cread(dev, pci_cfg, + dev_cfg, 2, + offset + 10); + cap.pci.requester_end = viommu_cread(dev, pci_cfg, + dev_cfg, 2, offset + + 12); + dev_info(&dev->dev, + "topology: adding PCI range 0x%x [0x%x:0x%x] -> 0x%x\n", + cap.pci.hierarchy, cap.pci.requester_start, + cap.pci.requester_end, cap.pci.endpoint_start); + if (viommu_add_topology(viommu_spec, &cap)) + return -ENOMEM; + break; + case VIRTIO_IOMMU_TOPO_ENDPOINT: + cap.ep.endpoint = viommu_cread(dev, pci_cfg, dev_cfg, 2, + offset + 4); + cap.ep.address = viommu_cread(dev, pci_cfg, dev_cfg, 2, + offset + 8); + dev_info(&dev->dev, + "topology: adding endpoint 0x%llx -> 0x%x\n", + cap.ep.address, cap.ep.endpoint); + if (viommu_add_topology(viommu_spec, &cap)) + return -ENOMEM; + break; + default: + dev_warn(&dev->dev, "Unknown topo structure 0x%x\n", + cap.head.type); + break; + } + + offset = cap.head.next; + } + + /* TODO: handle device removal */ + mutex_lock(&viommus_lock); + list_add(&viommu_spec->list, &viommus); + mutex_unlock(&viommus_lock); + + return 0; +} + +static void viommu_pci_parse_topology(struct pci_dev *dev) +{ + int pos; + u32 features; + struct viommu_cap_config common = {0}; + struct viommu_cap_config pci_cfg = {0}; + struct viommu_cap_config dev_cfg = {0}; + + pos = viommu_find_capability(dev, VIRTIO_PCI_CAP_COMMON_CFG, &common); + if (!pos) { + dev_warn(&dev->dev, "common capability not found\n"); + return; + } + pos = viommu_find_capability(dev, VIRTIO_PCI_CAP_DEVICE_CFG, &dev_cfg); + if (!pos) { + dev_warn(&dev->dev, "device config capability not found\n"); + return; + } + pos = viommu_find_capability(dev, VIRTIO_PCI_CAP_PCI_CFG, &pci_cfg); + if (!pos) { + dev_warn(&dev->dev, "PCI config capability not found\n"); + return; + } + + /* Find out if the device supports topology description */ + viommu_cwrite(dev, pos, &common, 4, + offsetof(struct virtio_pci_common_cfg, + device_feature_select), + 0); + features = viommu_cread(dev, pos, &common, 4, + offsetof(struct virtio_pci_common_cfg, + device_feature)); + if (!(features & VIRTIO_IOMMU_F_TOPOLOGY)) { + dev_dbg(&dev->dev, "device doesn't have topology description"); + return; + } + + viommu_parse_topology(dev, pos, &dev_cfg); +} + +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1014, + viommu_pci_parse_topology); + +static const struct iommu_ops *virt_iommu_setup(struct device *dev) +{ + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); + const struct iommu_ops *viommu_ops = NULL; + struct fwnode_handle *viommu_fwnode; + struct viommu_spec *viommu_spec; + struct viommu_topology *topo; + struct pci_dev *pdev = NULL; + struct device *viommu_dev; + bool found = false; + u16 devid; + u32 eid; + int ret; + + /* Already translated? */ + if (fwspec && fwspec->ops) + return fwspec->ops; + + if (dev_is_pci(dev)) { + pdev = to_pci_dev(dev); + devid = pci_dev_id(pdev); + } else { + /* TODO: Do something with devres */ + return NULL; + } + + mutex_lock(&viommus_lock); + list_for_each_entry(viommu_spec, &viommus, list) { + list_for_each_entry(topo, &viommu_spec->topology, list) { + if (pdev && + topo->head.type == VIRTIO_IOMMU_TOPO_PCI_RANGE && + pci_domain_nr(pdev->bus) == topo->pci.hierarchy && + devid >= topo->pci.requester_start && + devid <= topo->pci.requester_end) { + found = true; + eid = devid - topo->pci.requester_start + + topo->pci.endpoint_start; + break; + } else if (!pdev) { + /* TODO: compare address with devres */ + } + } + if (found) { + viommu_ops = viommu_spec->ops; + viommu_fwnode = viommu_spec->fwnode; + viommu_dev = viommu_spec->dev; + break; + } + } + mutex_unlock(&viommus_lock); + if (!found) + return NULL; + + /* We're not translating ourselves, that would be silly. */ + if (viommu_dev == dev) + return NULL; + + if (!viommu_ops) + return ERR_PTR(-EPROBE_DEFER); + + ret = iommu_fwspec_init(dev, viommu_fwnode, viommu_ops); + if (ret) + return ERR_PTR(ret); + + iommu_fwspec_add_ids(dev, &eid, 1); + + return viommu_ops; +} + +/** + * virt_dma_configure - Configure DMA of virtualized devices + * @dev: the endpoint + * + * An alternative to the ACPI and DT methods to setup DMA and the IOMMU ops of a + * virtual device. + * + * Return: -EPROBE_DEFER if the IOMMU hasn't been loaded yet, 0 otherwise + */ +int virt_dma_configure(struct device *dev) +{ + const struct iommu_ops *iommu_ops; + + /* TODO: do we need to mess about with the dma_mask as well? */ + WARN_ON(!dev->dma_mask); + + iommu_ops = virt_iommu_setup(dev); + if (IS_ERR(iommu_ops)) { + if (PTR_ERR(iommu_ops) == -EPROBE_DEFER) + return -EPROBE_DEFER; + iommu_ops = NULL; + } + + /* + * If we have reason to believe the IOMMU driver missed the initial + * add_device callback for dev, replay it to get things in order. + */ + if (iommu_ops && dev->bus && !device_iommu_mapped(dev)) + iommu_probe_device(dev); + +#ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS + /* Assume coherent, as well as full 64-bit addresses. */ + arch_setup_dma_ops(dev, 0, ~0UL, iommu_ops, true); +#else + if (iommu_ops) + iommu_setup_dma_ops(dev, 0, ~0UL); +#endif + return 0; +} + +/** + * virt_set_iommu_ops - Set the IOMMU ops of a virtual IOMMU device + * + * Setup the iommu_ops associated to a viommu_spec, once the driver is loaded + * and the device probed. + */ +void virt_set_iommu_ops(struct device *dev, struct iommu_ops *ops) +{ + struct viommu_spec *viommu_spec; + + mutex_lock(&viommus_lock); + list_for_each_entry(viommu_spec, &viommus, list) { + if (viommu_spec->dev == dev) { + viommu_spec->ops = ops; + viommu_spec->fwnode = ops ? dev->fwnode : NULL; + break; + } + } + mutex_unlock(&viommus_lock); +} diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 9847552faecc..f68ee9615b38 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -22,6 +22,7 @@ #include <linux/virtio.h> #include <linux/virtio_config.h> #include <linux/virtio_ids.h> +#include <linux/virtio_iommu.h> #include <linux/wait.h> #include <uapi/linux/virtio_iommu.h> @@ -1134,6 +1135,7 @@ static int viommu_probe(struct virtio_device *vdev) if (ret) goto err_sysfs_remove; + virt_set_iommu_ops(dev->parent, &viommu_ops); iommu_device_set_ops(&viommu->iommu, &viommu_ops); iommu_device_register(&viommu->iommu); @@ -1182,6 +1184,7 @@ static void viommu_remove(struct virtio_device *vdev) struct viommu_dev *viommu = vdev->priv; iommu_device_unregister(&viommu->iommu); + virt_set_iommu_ops(vdev->dev.parent, NULL); viommu_clear_fwnode(viommu); iommu_device_sysfs_remove(&viommu->iommu); diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index a8124e47bf6e..d9b5e902ad18 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -17,6 +17,7 @@ #include <linux/suspend.h> #include <linux/kexec.h> #include <linux/of_device.h> +#include <linux/virtio_iommu.h> #include <linux/acpi.h> #include "pci.h" #include "pcie/portdrv.h" @@ -1633,6 +1634,8 @@ static int pci_dma_configure(struct device *dev) struct acpi_device *adev = to_acpi_device_node(bridge->fwnode); ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev)); + } else if (IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)) { + ret = virt_dma_configure(dev); } pci_put_host_bridge_device(bridge); diff --git a/include/linux/virtio_iommu.h b/include/linux/virtio_iommu.h new file mode 100644 index 000000000000..b700256f1063 --- /dev/null +++ b/include/linux/virtio_iommu.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef VIRTIO_IOMMU_H_ +#define VIRTIO_IOMMU_H_ + +#if IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY) +int virt_dma_configure(struct device *dev); +void virt_set_iommu_ops(struct device *dev, struct iommu_ops *ops); +#else /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY */ +static inline int virt_dma_configure(struct device *dev) +{ + return -ENODEV; +} + +static inline void virt_set_iommu_ops(struct device *dev, struct iommu_ops *ops) +{ } +#endif /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY */ + +#endif /* VIRTIO_IOMMU_H_ */ diff --git a/include/uapi/linux/virtio_iommu.h b/include/uapi/linux/virtio_iommu.h index 237e36a280cb..d3b7cd2a076f 100644 --- a/include/uapi/linux/virtio_iommu.h +++ b/include/uapi/linux/virtio_iommu.h @@ -16,6 +16,7 @@ #define VIRTIO_IOMMU_F_BYPASS 3 #define VIRTIO_IOMMU_F_PROBE 4 #define VIRTIO_IOMMU_F_MMIO 5 +#define VIRTIO_IOMMU_F_TOPOLOGY 6 struct virtio_iommu_range_64 { __le64 start; @@ -36,6 +37,31 @@ struct virtio_iommu_config { struct virtio_iommu_range_32 domain_range; /* Probe buffer size */ __le32 probe_size; + /* Offset to the beginning of the topology table */ + __le16 topo_offset; +}; + +struct virtio_iommu_topo_head { + __le16 type; + __le16 next; +}; + +#define VIRTIO_IOMMU_TOPO_PCI_RANGE 0x0 +#define VIRTIO_IOMMU_TOPO_ENDPOINT 0x1 + +struct virtio_iommu_topo_pci_range { + struct virtio_iommu_topo_head head; + __le32 endpoint_start; + __le16 hierarchy; + __le16 requester_start; + __le16 requester_end; + __le16 reserved; +}; + +struct virtio_iommu_topo_endpoint { + struct virtio_iommu_topo_head head; + __le32 endpoint; + __le64 address; }; /* Request types */ -- 2.24.0
Michael S. Tsirkin
2019-Nov-22 12:53 UTC
[RFC 13/13] iommu/virtio: Add topology description to
On Fri, Nov 22, 2019 at 11:50:00AM +0100, Jean-Philippe Brucker wrote:> Some hypervisors don't implement either device-tree or ACPI, but still > need a method to describe the IOMMU topology. Read the virtio-iommu > config early and parse the topology description. Hook into the > dma_setup() callbacks to initialize the IOMMU before probing endpoints. > > If the virtio-iommu uses the virtio-pci transport, this will only work > if the PCI root complex is the first device probed. We don't currently > support virtio-mmio. > > Initially I tried to generate a fake IORT table and feed it to the IORT > driver, in order to avoid rewriting the whole DMA code, but it wouldn't > work with platform endpoints, which are references to items in the ACPI > table on IORT. > > Signed-off-by: Eric Auger <eric.auger at redhat.com> > Signed-off-by: Jean-Philippe Brucker <jean-philippe at linaro.org>Overall this looks good to me. The only point is that I think the way the interface is designed makes writing the driver a bit too difficult. Idea: if instead we just have a length field and then an array of records (preferably unions so we don't need to work hard), we can shadow that into memory, then iterate over the unions. Maybe add a uniform record length + number of records field. Then just skip types you do not know how to handle. This will also help make sure it's within bounds. What do you think? You will need to do something to address the TODO I think.> --- > Note that we only call virt_dma_configure() if the host didn't provide > either DT or ACPI method. If you want to test this with QEMU, you'll > need to manually disable the acpi_dma_configure() part in pci-driver.c > --- > drivers/base/platform.c | 3 + > drivers/iommu/Kconfig | 9 + > drivers/iommu/Makefile | 1 + > drivers/iommu/virtio-iommu-topology.c | 410 ++++++++++++++++++++++++++ > drivers/iommu/virtio-iommu.c | 3 + > drivers/pci/pci-driver.c | 3 + > include/linux/virtio_iommu.h | 18 ++ > include/uapi/linux/virtio_iommu.h | 26 ++ > 8 files changed, 473 insertions(+) > create mode 100644 drivers/iommu/virtio-iommu-topology.c > create mode 100644 include/linux/virtio_iommu.h > > diff --git a/drivers/base/platform.c b/drivers/base/platform.c > index b230beb6ccb4..70b12c8ef2fb 100644 > --- a/drivers/base/platform.c > +++ b/drivers/base/platform.c > @@ -27,6 +27,7 @@ > #include <linux/limits.h> > #include <linux/property.h> > #include <linux/kmemleak.h> > +#include <linux/virtio_iommu.h> > > #include "base.h" > #include "power/power.h" > @@ -1257,6 +1258,8 @@ int platform_dma_configure(struct device *dev) > } else if (has_acpi_companion(dev)) { > attr = acpi_get_dma_attr(to_acpi_device_node(dev->fwnode)); > ret = acpi_dma_configure(dev, attr); > + } else if (IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)) { > + ret = virt_dma_configure(dev); > } > > return ret; > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig > index e6eb4f238d1a..d02c0d36019d 100644 > --- a/drivers/iommu/Kconfig > +++ b/drivers/iommu/Kconfig > @@ -486,4 +486,13 @@ config VIRTIO_IOMMU > > Say Y here if you intend to run this kernel as a guest. > > +config VIRTIO_IOMMU_TOPOLOGY > + bool "Topology properties for the virtio-iommu" > + depends on VIRTIO_IOMMU > + help > + Enable early probing of the virtio-iommu device, to detect the > + topology description. > + > + Say Y here if you intend to run this kernel as a guest. > + > endif # IOMMU_SUPPORT > diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile > index 4f405f926e73..6b51c4186ebc 100644 > --- a/drivers/iommu/Makefile > +++ b/drivers/iommu/Makefile > @@ -35,3 +35,4 @@ obj-$(CONFIG_S390_IOMMU) += s390-iommu.o > obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o > obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o > obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o > +obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY) += virtio-iommu-topology.o > diff --git a/drivers/iommu/virtio-iommu-topology.c b/drivers/iommu/virtio-iommu-topology.c > new file mode 100644 > index 000000000000..ec22510ace3d > --- /dev/null > +++ b/drivers/iommu/virtio-iommu-topology.c > @@ -0,0 +1,410 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#include <linux/acpi.h> > +#include <linux/acpi_iort.h> > +#include <linux/dma-iommu.h> > +#include <linux/iommu.h> > +#include <linux/list.h> > +#include <linux/pci.h> > +#include <linux/printk.h> > +#include <linux/virtio_config.h> > +#include <linux/virtio_iommu.h> > +#include <linux/virtio_pci.h> > +#include <uapi/linux/virtio_iommu.h> > + > +struct viommu_cap_config { > + u8 pos; /* PCI capability position */ > + u8 bar; > + u32 length; /* structure size */ > + u32 offset; /* structure offset within the bar */ > +}; > + > +struct viommu_spec { > + struct device *dev; /* transport device */ > + struct fwnode_handle *fwnode; > + struct iommu_ops *ops; > + struct list_head topology; > + struct list_head list; > +}; > + > +struct viommu_topology { > + union { > + struct virtio_iommu_topo_head head; > + struct virtio_iommu_topo_pci_range pci; > + struct virtio_iommu_topo_endpoint ep; > + }; > + /* Index into viommu_spec->topology */ > + struct list_head list; > +}; > + > +static LIST_HEAD(viommus); > +static DEFINE_MUTEX(viommus_lock); > + > +#define VPCI_FIELD(field) offsetof(struct virtio_pci_cap, field) > + > +static inline int viommu_find_capability(struct pci_dev *dev, u8 cfg_type, > + struct viommu_cap_config *cap) > +{ > + int pos; > + u8 bar; > + > + for (pos = pci_find_capability(dev, PCI_CAP_ID_VNDR); > + pos > 0; > + pos = pci_find_next_capability(dev, pos, PCI_CAP_ID_VNDR)) { > + u8 type; > + > + pci_read_config_byte(dev, pos + VPCI_FIELD(cfg_type), &type); > + if (type != cfg_type) > + continue; > + > + pci_read_config_byte(dev, pos + VPCI_FIELD(bar), &bar); > + > + /* Ignore structures with reserved BAR values */ > + if (type != VIRTIO_PCI_CAP_PCI_CFG && bar > 0x5) > + continue; > + > + cap->bar = bar; > + cap->pos = pos; > + pci_read_config_dword(dev, pos + VPCI_FIELD(length), > + &cap->length); > + pci_read_config_dword(dev, pos + VPCI_FIELD(offset), > + &cap->offset); > + > + return pos; > + } > + return 0; > +} > + > +/* > + * Setup the special virtio PCI capability to read one of the config registers > + */ > +static int viommu_switch_pci_cfg(struct pci_dev *dev, int cfg, > + struct viommu_cap_config *cap, u32 length, > + u32 offset) > +{ > + offset += cap->offset; > + > + if (offset + length > cap->offset + cap->length) { > + dev_warn(&dev->dev, > + "read of %d bytes at offset 0x%x overflows cap of size %d\n", > + length, offset, cap->length); > + return -EOVERFLOW; > + } > + > + pci_write_config_byte(dev, cfg + VPCI_FIELD(bar), cap->bar); > + pci_write_config_dword(dev, cfg + VPCI_FIELD(length), length); > + pci_write_config_dword(dev, cfg + VPCI_FIELD(offset), offset); > + return 0; > +} > + > +static u32 viommu_cread(struct pci_dev *dev, int cfg, > + struct viommu_cap_config *cap, u32 length, u32 offset) > +{ > + u8 val8; > + u16 val16; > + u32 val32; > + int out = cfg + sizeof(struct virtio_pci_cap); > + > + if (viommu_switch_pci_cfg(dev, cfg, cap, length, offset)) > + return 0; > + > + switch (length) { > + case 1: > + pci_read_config_byte(dev, out, &val8); > + return val8; > + case 2: > + pci_read_config_word(dev, out, &val16); > + return val16; > + case 4: > + pci_read_config_dword(dev, out, &val32); > + return val32; > + default: > + WARN_ON(1); > + return 0; > + } > +} > + > +static void viommu_cwrite(struct pci_dev *dev, int cfg, > + struct viommu_cap_config *cap, u32 length, u32 offset, > + u32 val)A single user with 4 byte parameter. Just open-code?> +{ > + int out = cfg + sizeof(struct virtio_pci_cap); > + > + if (viommu_switch_pci_cfg(dev, cfg, cap, length, offset)) > + return; > + > + switch (length) { > + case 1: > + pci_write_config_byte(dev, out, (u8)val); > + break; > + case 2: > + pci_write_config_word(dev, out, (u16)val); > + break; > + case 4: > + pci_write_config_dword(dev, out, val); > + break; > + default: > + WARN_ON(1); > + } > +} > + > +static int viommu_add_topology(struct viommu_spec *viommu_spec, > + struct viommu_topology *cap) > +{ > + struct viommu_topology *new = kmemdup(cap, sizeof(*cap), GFP_KERNEL); > + > + if (!new) > + return -ENOMEM; > + > + mutex_lock(&viommus_lock); > + list_add(&new->list, &viommu_spec->topology); > + mutex_unlock(&viommus_lock); > + return 0; > +} > + > +static int viommu_parse_topology(struct pci_dev *dev, int pci_cfg, > + struct viommu_cap_config *dev_cfg) > +{ > + u32 offset; > + struct viommu_topology cap; > + struct viommu_spec *viommu_spec; > + int iter = 0; /* Protects against config loop */ > + > + offset = viommu_cread(dev, pci_cfg, dev_cfg, 2, > + offsetof(struct virtio_iommu_config, > + topo_offset)); > + if (!offset) > + return 0; > + > + viommu_spec = kzalloc(sizeof(*viommu_spec), GFP_KERNEL); > + if (!viommu_spec) > + return -ENOMEM; > + > + INIT_LIST_HEAD(&viommu_spec->topology); > + viommu_spec->dev = &dev->dev; > + > + while (offset >= sizeof(struct virtio_iommu_config) && ++iter < 0x10000) { > + memset(&cap, 0, sizeof(cap)); > + > + cap.head.type = viommu_cread(dev, pci_cfg, dev_cfg, 2, offset); > + cap.head.next = viommu_cread(dev, pci_cfg, dev_cfg, 2, offset + 2);All of this doesn't seem to be endian-clean. Try running sparse I think it will complain.> + > + switch (cap.head.type) { > + case VIRTIO_IOMMU_TOPO_PCI_RANGE: > + cap.pci.endpoint_start = viommu_cread(dev, pci_cfg, > + dev_cfg, 2, offset > + + 4); > + cap.pci.hierarchy = viommu_cread(dev, pci_cfg, dev_cfg, > + 2, offset + 8); > + cap.pci.requester_start = viommu_cread(dev, pci_cfg, > + dev_cfg, 2, > + offset + 10); > + cap.pci.requester_end = viommu_cread(dev, pci_cfg, > + dev_cfg, 2, offset + > + 12); > + dev_info(&dev->dev, > + "topology: adding PCI range 0x%x [0x%x:0x%x] -> 0x%x\n", > + cap.pci.hierarchy, cap.pci.requester_start, > + cap.pci.requester_end, cap.pci.endpoint_start); > + if (viommu_add_topology(viommu_spec, &cap)) > + return -ENOMEM; > + break; > + case VIRTIO_IOMMU_TOPO_ENDPOINT: > + cap.ep.endpoint = viommu_cread(dev, pci_cfg, dev_cfg, 2, > + offset + 4); > + cap.ep.address = viommu_cread(dev, pci_cfg, dev_cfg, 2, > + offset + 8); > + dev_info(&dev->dev, > + "topology: adding endpoint 0x%llx -> 0x%x\n", > + cap.ep.address, cap.ep.endpoint); > + if (viommu_add_topology(viommu_spec, &cap)) > + return -ENOMEM; > + break; > + default: > + dev_warn(&dev->dev, "Unknown topo structure 0x%x\n", > + cap.head.type); > + break; > + } > + > + offset = cap.head.next; > + } > + > + /* TODO: handle device removal */ > + mutex_lock(&viommus_lock); > + list_add(&viommu_spec->list, &viommus); > + mutex_unlock(&viommus_lock); > + > + return 0; > +} > + > +static void viommu_pci_parse_topology(struct pci_dev *dev) > +{ > + int pos; > + u32 features; > + struct viommu_cap_config common = {0}; > + struct viommu_cap_config pci_cfg = {0}; > + struct viommu_cap_config dev_cfg = {0}; > + > + pos = viommu_find_capability(dev, VIRTIO_PCI_CAP_COMMON_CFG, &common); > + if (!pos) { > + dev_warn(&dev->dev, "common capability not found\n"); > + return; > + } > + pos = viommu_find_capability(dev, VIRTIO_PCI_CAP_DEVICE_CFG, &dev_cfg); > + if (!pos) { > + dev_warn(&dev->dev, "device config capability not found\n"); > + return; > + } > + pos = viommu_find_capability(dev, VIRTIO_PCI_CAP_PCI_CFG, &pci_cfg); > + if (!pos) { > + dev_warn(&dev->dev, "PCI config capability not found\n"); > + return; > + } > + > + /* Find out if the device supports topology description */ > + viommu_cwrite(dev, pos, &common, 4, > + offsetof(struct virtio_pci_common_cfg, > + device_feature_select), > + 0); > + features = viommu_cread(dev, pos, &common, 4, > + offsetof(struct virtio_pci_common_cfg, > + device_feature)); > + if (!(features & VIRTIO_IOMMU_F_TOPOLOGY)) { > + dev_dbg(&dev->dev, "device doesn't have topology description"); > + return; > + } > + > + viommu_parse_topology(dev, pos, &dev_cfg); > +} > + > +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1014, > + viommu_pci_parse_topology); > + > +static const struct iommu_ops *virt_iommu_setup(struct device *dev) > +{ > + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); > + const struct iommu_ops *viommu_ops = NULL; > + struct fwnode_handle *viommu_fwnode; > + struct viommu_spec *viommu_spec; > + struct viommu_topology *topo; > + struct pci_dev *pdev = NULL; > + struct device *viommu_dev; > + bool found = false; > + u16 devid; > + u32 eid; > + int ret; > + > + /* Already translated? */ > + if (fwspec && fwspec->ops) > + return fwspec->ops; > + > + if (dev_is_pci(dev)) { > + pdev = to_pci_dev(dev); > + devid = pci_dev_id(pdev); > + } else { > + /* TODO: Do something with devres */ > + return NULL; > + } > + > + mutex_lock(&viommus_lock); > + list_for_each_entry(viommu_spec, &viommus, list) { > + list_for_each_entry(topo, &viommu_spec->topology, list) { > + if (pdev && > + topo->head.type == VIRTIO_IOMMU_TOPO_PCI_RANGE && > + pci_domain_nr(pdev->bus) == topo->pci.hierarchy && > + devid >= topo->pci.requester_start && > + devid <= topo->pci.requester_end) { > + found = true; > + eid = devid - topo->pci.requester_start + > + topo->pci.endpoint_start; > + break; > + } else if (!pdev) { > + /* TODO: compare address with devres */ > + } > + } > + if (found) { > + viommu_ops = viommu_spec->ops; > + viommu_fwnode = viommu_spec->fwnode; > + viommu_dev = viommu_spec->dev; > + break; > + } > + } > + mutex_unlock(&viommus_lock); > + if (!found) > + return NULL; > + > + /* We're not translating ourselves, that would be silly. */ > + if (viommu_dev == dev) > + return NULL; > + > + if (!viommu_ops) > + return ERR_PTR(-EPROBE_DEFER); > + > + ret = iommu_fwspec_init(dev, viommu_fwnode, viommu_ops); > + if (ret) > + return ERR_PTR(ret); > + > + iommu_fwspec_add_ids(dev, &eid, 1); > + > + return viommu_ops; > +} > + > +/** > + * virt_dma_configure - Configure DMA of virtualized devices > + * @dev: the endpoint > + * > + * An alternative to the ACPI and DT methods to setup DMA and the IOMMU ops of a > + * virtual device. > + * > + * Return: -EPROBE_DEFER if the IOMMU hasn't been loaded yet, 0 otherwise > + */ > +int virt_dma_configure(struct device *dev) > +{ > + const struct iommu_ops *iommu_ops; > + > + /* TODO: do we need to mess about with the dma_mask as well? */ > + WARN_ON(!dev->dma_mask); > + > + iommu_ops = virt_iommu_setup(dev); > + if (IS_ERR(iommu_ops)) { > + if (PTR_ERR(iommu_ops) == -EPROBE_DEFER) > + return -EPROBE_DEFER; > + iommu_ops = NULL; > + } > + > + /* > + * If we have reason to believe the IOMMU driver missed the initial > + * add_device callback for dev, replay it to get things in order. > + */ > + if (iommu_ops && dev->bus && !device_iommu_mapped(dev)) > + iommu_probe_device(dev); > + > +#ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS > + /* Assume coherent, as well as full 64-bit addresses. */ > + arch_setup_dma_ops(dev, 0, ~0UL, iommu_ops, true); > +#else > + if (iommu_ops) > + iommu_setup_dma_ops(dev, 0, ~0UL); > +#endif > + return 0; > +} > + > +/** > + * virt_set_iommu_ops - Set the IOMMU ops of a virtual IOMMU device > + * > + * Setup the iommu_ops associated to a viommu_spec, once the driver is loaded > + * and the device probed. > + */ > +void virt_set_iommu_ops(struct device *dev, struct iommu_ops *ops) > +{ > + struct viommu_spec *viommu_spec; > + > + mutex_lock(&viommus_lock); > + list_for_each_entry(viommu_spec, &viommus, list) { > + if (viommu_spec->dev == dev) { > + viommu_spec->ops = ops; > + viommu_spec->fwnode = ops ? dev->fwnode : NULL; > + break; > + } > + } > + mutex_unlock(&viommus_lock); > +} > diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c > index 9847552faecc..f68ee9615b38 100644 > --- a/drivers/iommu/virtio-iommu.c > +++ b/drivers/iommu/virtio-iommu.c > @@ -22,6 +22,7 @@ > #include <linux/virtio.h> > #include <linux/virtio_config.h> > #include <linux/virtio_ids.h> > +#include <linux/virtio_iommu.h> > #include <linux/wait.h> > > #include <uapi/linux/virtio_iommu.h> > @@ -1134,6 +1135,7 @@ static int viommu_probe(struct virtio_device *vdev) > if (ret) > goto err_sysfs_remove; > > + virt_set_iommu_ops(dev->parent, &viommu_ops); > iommu_device_set_ops(&viommu->iommu, &viommu_ops); > iommu_device_register(&viommu->iommu); > > @@ -1182,6 +1184,7 @@ static void viommu_remove(struct virtio_device *vdev) > struct viommu_dev *viommu = vdev->priv; > > iommu_device_unregister(&viommu->iommu); > + virt_set_iommu_ops(vdev->dev.parent, NULL); > viommu_clear_fwnode(viommu); > iommu_device_sysfs_remove(&viommu->iommu); > > diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c > index a8124e47bf6e..d9b5e902ad18 100644 > --- a/drivers/pci/pci-driver.c > +++ b/drivers/pci/pci-driver.c > @@ -17,6 +17,7 @@ > #include <linux/suspend.h> > #include <linux/kexec.h> > #include <linux/of_device.h> > +#include <linux/virtio_iommu.h> > #include <linux/acpi.h> > #include "pci.h" > #include "pcie/portdrv.h" > @@ -1633,6 +1634,8 @@ static int pci_dma_configure(struct device *dev) > struct acpi_device *adev = to_acpi_device_node(bridge->fwnode); > > ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev)); > + } else if (IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY)) { > + ret = virt_dma_configure(dev); > } > > pci_put_host_bridge_device(bridge); > diff --git a/include/linux/virtio_iommu.h b/include/linux/virtio_iommu.h > new file mode 100644 > index 000000000000..b700256f1063 > --- /dev/null > +++ b/include/linux/virtio_iommu.h > @@ -0,0 +1,18 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +#ifndef VIRTIO_IOMMU_H_ > +#define VIRTIO_IOMMU_H_ > + > +#if IS_ENABLED(CONFIG_VIRTIO_IOMMU_TOPOLOGY) > +int virt_dma_configure(struct device *dev); > +void virt_set_iommu_ops(struct device *dev, struct iommu_ops *ops); > +#else /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY */ > +static inline int virt_dma_configure(struct device *dev) > +{ > + return -ENODEV; > +} > + > +static inline void virt_set_iommu_ops(struct device *dev, struct iommu_ops *ops) > +{ } > +#endif /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY */ > + > +#endif /* VIRTIO_IOMMU_H_ */ > diff --git a/include/uapi/linux/virtio_iommu.h b/include/uapi/linux/virtio_iommu.h > index 237e36a280cb..d3b7cd2a076f 100644 > --- a/include/uapi/linux/virtio_iommu.h > +++ b/include/uapi/linux/virtio_iommu.h > @@ -16,6 +16,7 @@ > #define VIRTIO_IOMMU_F_BYPASS 3 > #define VIRTIO_IOMMU_F_PROBE 4 > #define VIRTIO_IOMMU_F_MMIO 5 > +#define VIRTIO_IOMMU_F_TOPOLOGY 6 > > struct virtio_iommu_range_64 { > __le64 start; > @@ -36,6 +37,31 @@ struct virtio_iommu_config { > struct virtio_iommu_range_32 domain_range; > /* Probe buffer size */ > __le32 probe_size; > + /* Offset to the beginning of the topology table */ > + __le16 topo_offset;why do we need an offset?> +}; > + > +struct virtio_iommu_topo_head { > + __le16 type; > + __le16 next; > +};So this linked list makes things harder than necessary imho. It will be easier to just have a counter with # of records. Then make all records the same size. Then just read each record out into a buffer, and handle it there.> + > +#define VIRTIO_IOMMU_TOPO_PCI_RANGE 0x0 > +#define VIRTIO_IOMMU_TOPO_ENDPOINT 0x1 > + > +struct virtio_iommu_topo_pci_range { > + struct virtio_iommu_topo_head head; > + __le32 endpoint_start; > + __le16 hierarchy; > + __le16 requester_start; > + __le16 requester_end; > + __le16 reserved; > +}; > + > +struct virtio_iommu_topo_endpoint { > + struct virtio_iommu_topo_head head; > + __le32 endpoint; > + __le64 address; > }; > > /* Request types */ > -- > 2.24.0
Michael S. Tsirkin
2019-Nov-22 13:00 UTC
[RFC 00/13] virtio-iommu on non-devicetree platforms
On Fri, Nov 22, 2019 at 11:49:47AM +0100, Jean-Philippe Brucker wrote:> I'm seeking feedback on multi-platform support for virtio-iommu. At the > moment only devicetree (DT) is supported and we don't have a pleasant > solution for other platforms. Once we figure out the topology > description, x86 support is trivial. > > Since the IOMMU manages memory accesses from other devices, the guest > kernel needs to initialize the IOMMU before endpoints start issuing DMA. > It's a solved problem: firmware or hypervisor describes through DT or > ACPI tables the device dependencies, and probe of endpoints is deferred > until the IOMMU is probed. But: > > (1) ACPI has one table per vendor (DMAR for Intel, IVRS for AMD and IORT > for Arm). From my point of view IORT is easier to extend, since we > just need to introduce a new node type. There are no dependencies to > Arm in the Linux IORT driver, so it works well with CONFIG_X86. > > However, there are concerns about other OS vendors feeling obligated > to implement this new node, so Arm proposed introducing another ACPI > table, that can wrap any of DMAR, IVRS and IORT to extend it with > new virtual nodes. A draft of this VIOT table specification is > available at http://jpbrucker.net/virtio-iommu/viot/viot-v5.pdf > > I'm afraid this could increase fragmentation as guests would need to > implement or modify their support for all of DMAR, IVRS and IORT. If > we end up doing VIOT, I suggest limiting it to IORT. > > (2) In addition, there are some concerns about having virtio depend on > ACPI or DT. Some hypervisors (Firecracker, QEMU microvm, kvmtool x86 > [1])power?> don't currently implement those methods. > > It was suggested to embed the topology description into the device. > It can work, as demonstrated at the end of this RFC, with the > following limitations: > > - The topology description must be read before any endpoint managed > by the IOMMU is probed, and even before the virtio module is > loaded. This RFC uses a PCI quirk to manually parse the virtio > configuration. It assumes that all endpoints managed by the IOMMU > are under this same PCI host. > > - I don't have a solution for the virtio-mmio transport at the > moment, because I haven't had time to modify a host to test it. I > think it could either use a notifier on the platform bus, or > better, a new 'iommu' command-line argument to the virtio-mmio > driver.A notifier seems easier for users. What are the disadvantages of that?> So the current prototype doesn't work for firecracker and > microvm, which rely on virtio-mmio. > > - For Arm, if the platform has an ITS, the hypervisor needs IORT or > DT to describe it anyway. More generally, not using either ACPI or > DT might prevent from supporting other features as well. I suspect > the above users will have to implement a standard method sooner or > later. > > - Even when reusing as much existing code as possible, guest support > is still going to be around a few hundred lines since we can't > rely on the normal virtio infrastructure to be loaded at that > point. As you can see below, the diffstat for the incomplete > topology implementation is already bigger than the exhaustive IORT > support, even when jumping through the VIOT hoop. > > So it's a lightweight solution for very specific use-cases, and we > should still support ACPI for the general case. Multi-platform > guests such as Linux will then need to support three topology > descriptions instead of two. > > In this RFC I present both solutions, but I'd rather not keep all of it. > Please see the individual patches for details: > > (1) Patches 1, 3-10 add support for virtio-iommu to the Linux IORT > driver and patches 2, 11 add the VIOT glue. > > (2) Patch 12 adds the built-in topology description to the virtio-iommu > specification. Patch 13 is a partial implementation for the Linux > virtio-iommu driver. It only supports PCI, not platform devices. > > You can find Linux and QEMU code on my virtio-iommu/devel branches at > http://jpbrucker.net/git/linux and http://jpbrucker.net/git/qemu > > > I split the diffstat since there are two independent features. The first > one is for patches 1-11, and the second one for patch 13. > > Jean-Philippe Brucker (11): > ACPI/IORT: Move IORT to the ACPI folder > ACPI: Add VIOT definitions > ACPI/IORT: Allow registration of external tables > ACPI/IORT: Add node categories > ACPI/IORT: Support VIOT virtio-mmio node > ACPI/IORT: Support VIOT virtio-pci node > ACPI/IORT: Defer probe until virtio-iommu-pci has registered a fwnode > ACPI/IORT: Add callback to update a device's fwnode > iommu/virtio: Create fwnode if necessary > iommu/virtio: Update IORT fwnode > ACPI: Add VIOT table > > MAINTAINERS | 9 + > drivers/acpi/Kconfig | 7 + > drivers/acpi/Makefile | 2 + > drivers/acpi/arm64/Kconfig | 3 - > drivers/acpi/arm64/Makefile | 1 - > drivers/acpi/bus.c | 2 + > drivers/acpi/{arm64 => }/iort.c | 317 ++++++++++++++++++++++++++------ > drivers/acpi/tables.c | 2 +- > drivers/acpi/viot.c | 44 +++++ > drivers/iommu/Kconfig | 1 + > drivers/iommu/virtio-iommu.c | 61 +++++- > include/acpi/actbl2.h | 31 ++++ > include/linux/acpi_iort.h | 14 ++ > include/linux/acpi_viot.h | 20 ++ > 14 files changed, 448 insertions(+), 66 deletions(-) > rename drivers/acpi/{arm64 => }/iort.c (86%) > create mode 100644 drivers/acpi/viot.c > create mode 100644 include/linux/acpi_viot.h > > Jean-Philippe Brucker (1): > iommu/virtio: Add topology description to virtio-iommu config space > > drivers/base/platform.c | 3 + > drivers/iommu/Kconfig | 9 + > drivers/iommu/Makefile | 1 + > drivers/iommu/virtio-iommu-topology.c | 410 ++++++++++++++++++++++++++ > drivers/iommu/virtio-iommu.c | 3 + > drivers/pci/pci-driver.c | 3 + > include/linux/virtio_iommu.h | 18 ++ > include/uapi/linux/virtio_iommu.h | 26 ++ > 8 files changed, 473 insertions(+) > create mode 100644 drivers/iommu/virtio-iommu-topology.c > create mode 100644 include/linux/virtio_iommu.h > > > [1] firecracker: https://github.com/firecracker-microvm/firecracker > microvm: https://github.com/qemu/qemu/blob/master/docs/microvm.rst > kvmtool: https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/ > -- > 2.24.0
Jean-Philippe Brucker
2019-Nov-25 17:53 UTC
[RFC 00/13] virtio-iommu on non-devicetree platforms
On Fri, Nov 22, 2019 at 08:00:46AM -0500, Michael S. Tsirkin wrote:> > (2) In addition, there are some concerns about having virtio depend on > > ACPI or DT. Some hypervisors (Firecracker, QEMU microvm, kvmtool x86 > > [1]) > > power?In kvmtool it boot with device tree. It also doesn't need virtio-iommu I think, since it has its own paravirtualized interface.> > don't currently implement those methods. > > > > It was suggested to embed the topology description into the device. > > It can work, as demonstrated at the end of this RFC, with the > > following limitations: > > > > - The topology description must be read before any endpoint managed > > by the IOMMU is probed, and even before the virtio module is > > loaded. This RFC uses a PCI quirk to manually parse the virtio > > configuration. It assumes that all endpoints managed by the IOMMU > > are under this same PCI host. > > > > - I don't have a solution for the virtio-mmio transport at the > > moment, because I haven't had time to modify a host to test it. I > > think it could either use a notifier on the platform bus, or > > better, a new 'iommu' command-line argument to the virtio-mmio > > driver. > > A notifier seems easier for users. What are the disadvantages of > that?For each device we have to check if it's virtio-mmio, then map the MMIO resource and check the device type. Having a dedicated command-line argument would be more efficient. Thanks, Jean
Reasonably Related Threads
- [RFC PATCH v2 4/5] ACPI/IORT: Support paravirtualized IOMMU
- [RFC 00/13] virtio-iommu on non-devicetree platforms
- [RFC 00/13] virtio-iommu on non-devicetree platforms
- [RFC 04/13] ACPI/IORT: Add node categories
- [RFC 08/13] ACPI/IORT: Add callback to update a device's fwnode