Mario Limonciello
2023-Nov-14 20:07 UTC
[Nouveau] [PATCH v3 0/7] Improvements to pcie_bandwidth_available() for eGPUs
The wrong values are reported from pcie_bandwidth_available() which can cause problems for performance of eGPUs. This series overhauls Thunderbolt related device detection and uses the changes to change the behavior of pcie_bandwidth_available(). v2->v3: * Stop lumping all thunderbolt VSEC and USB4 devices together, introduce is_virtual_link instead * Drop unnecessary patches Mario Limonciello (7): drm/nouveau: Switch from pci_is_thunderbolt_attached() to dev_is_removable() drm/radeon: Switch from pci_is_thunderbolt_attached() to dev_is_removable() PCI: Drop pci_is_thunderbolt_attached() PCI: pciehp: Move check for is_thunderbolt into a quirk PCI: ACPI: Detect PCIe root ports that are used for tunneling PCI: Split up some logic in pcie_bandwidth_available() to separate function PCI: Exclude PCIe ports used for virtual links in pcie_bandwidth_available() drivers/gpu/drm/nouveau/nouveau_vga.c | 6 +- drivers/gpu/drm/radeon/radeon_device.c | 4 +- drivers/gpu/drm/radeon/radeon_kms.c | 2 +- drivers/pci/hotplug/pciehp_hpc.c | 6 +- drivers/pci/pci-acpi.c | 16 ++++++ drivers/pci/pci.c | 77 +++++++++++++++++--------- drivers/pci/quirks.c | 20 +++++++ include/linux/pci.h | 24 +------- 8 files changed, 97 insertions(+), 58 deletions(-) base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86 -- 2.34.1
Mario Limonciello
2023-Nov-14 20:07 UTC
[Nouveau] [PATCH v3 1/7] drm/nouveau: Switch from pci_is_thunderbolt_attached() to dev_is_removable()
pci_is_thunderbolt_attached() looks at the hierarchy of the PCIe device to determine if any bridge along the way has the is_thunderbolt bit set. This bit will only be set when one of the devices in the hierarchy is an Intel Thunderbolt device. However PCIe devices can be connected to USB4 hubs and routers which won't necessarily set the is_thunderbolt bit. These devices will however be marked as externally facing which means they are marked removable by pci_set_removable(). Look whether the device is marked removable to determine it's connected to a Thunderbolt controller or USB4 router. Signed-off-by: Mario Limonciello <mario.limonciello at amd.com> --- v2->v3: * Update commit message --- drivers/gpu/drm/nouveau/nouveau_vga.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_vga.c b/drivers/gpu/drm/nouveau/nouveau_vga.c index f8bf0ec26844..14215b7ca187 100644 --- a/drivers/gpu/drm/nouveau/nouveau_vga.c +++ b/drivers/gpu/drm/nouveau/nouveau_vga.c @@ -94,8 +94,8 @@ nouveau_vga_init(struct nouveau_drm *drm) vga_client_register(pdev, nouveau_vga_set_decode); - /* don't register Thunderbolt eGPU with vga_switcheroo */ - if (pci_is_thunderbolt_attached(pdev)) + /* don't register USB4/Thunderbolt eGPU with vga_switcheroo */ + if (dev_is_removable(&pdev->dev)) return; vga_switcheroo_register_client(pdev, &nouveau_switcheroo_ops, runtime); @@ -118,7 +118,7 @@ nouveau_vga_fini(struct nouveau_drm *drm) vga_client_unregister(pdev); - if (pci_is_thunderbolt_attached(pdev)) + if (dev_is_removable(&pdev->dev)) return; vga_switcheroo_unregister_client(pdev); -- 2.34.1
Mario Limonciello
2023-Nov-14 20:07 UTC
[Nouveau] [PATCH v3 2/7] drm/radeon: Switch from pci_is_thunderbolt_attached() to dev_is_removable()
pci_is_thunderbolt_attached() looks at the hierarchy of the PCIe device to determine if any bridge along the way has the is_thunderbolt bit set. This bit will only be set when one of the devices in the hierarchy is an Intel Thunderbolt device. However PCIe devices can be connected to USB4 hubs and routers which won't necessarily set the is_thunderbolt bit. These devices will however be marked as externally facing which means they are marked removable by pci_set_removable(). Look whether the device is marked removable to determine it's connected to a Thunderbolt controller or USB4 router. Signed-off-by: Mario Limonciello <mario.limonciello at amd.com> --- v2->v3: * Update commit message --- drivers/gpu/drm/radeon/radeon_device.c | 4 ++-- drivers/gpu/drm/radeon/radeon_kms.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c index afbb3a80c0c6..ba0ca0694d18 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1429,7 +1429,7 @@ int radeon_device_init(struct radeon_device *rdev, if (rdev->flags & RADEON_IS_PX) runtime = true; - if (!pci_is_thunderbolt_attached(rdev->pdev)) + if (!dev_is_removable(&rdev->pdev->dev)) vga_switcheroo_register_client(rdev->pdev, &radeon_switcheroo_ops, runtime); if (runtime) @@ -1519,7 +1519,7 @@ void radeon_device_fini(struct radeon_device *rdev) radeon_bo_evict_vram(rdev); radeon_audio_component_fini(rdev); radeon_fini(rdev); - if (!pci_is_thunderbolt_attached(rdev->pdev)) + if (!dev_is_removable(&rdev->pdev->dev)) vga_switcheroo_unregister_client(rdev->pdev); if (rdev->flags & RADEON_IS_PX) vga_switcheroo_fini_domain_pm_ops(rdev->dev); diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c index a16590c6247f..ead912a58ab8 100644 --- a/drivers/gpu/drm/radeon/radeon_kms.c +++ b/drivers/gpu/drm/radeon/radeon_kms.c @@ -138,7 +138,7 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags) if ((radeon_runtime_pm != 0) && radeon_has_atpx() && ((flags & RADEON_IS_IGP) == 0) && - !pci_is_thunderbolt_attached(pdev)) + !dev_is_removable(&pdev->dev)) flags |= RADEON_IS_PX; /* radeon_device_init should report only fatal error -- 2.34.1
Mario Limonciello
2023-Nov-14 20:07 UTC
[Nouveau] [PATCH v3 3/7] PCI: Drop pci_is_thunderbolt_attached()
All callers have switched to dev_is_removable() for detecting hotpluggable PCIe devices. Signed-off-by: Mario Limonciello <mario.limonciello at amd.com> --- v2->v3: * No changes --- include/linux/pci.h | 22 ---------------------- 1 file changed, 22 deletions(-) diff --git a/include/linux/pci.h b/include/linux/pci.h index 60ca768bc867..1fbca2bd92e8 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2645,28 +2645,6 @@ static inline bool pci_ari_enabled(struct pci_bus *bus) return bus->self && bus->self->ari_enabled; } -/** - * pci_is_thunderbolt_attached - whether device is on a Thunderbolt daisy chain - * @pdev: PCI device to check - * - * Walk upwards from @pdev and check for each encountered bridge if it's part - * of a Thunderbolt controller. Reaching the host bridge means @pdev is not - * Thunderbolt-attached. (But rather soldered to the mainboard usually.) - */ -static inline bool pci_is_thunderbolt_attached(struct pci_dev *pdev) -{ - struct pci_dev *parent = pdev; - - if (pdev->is_thunderbolt) - return true; - - while ((parent = pci_upstream_bridge(parent))) - if (parent->is_thunderbolt) - return true; - - return false; -} - #if defined(CONFIG_PCIEPORTBUS) || defined(CONFIG_EEH) void pci_uevent_ers(struct pci_dev *pdev, enum pci_ers_result err_type); #endif -- 2.34.1
Mario Limonciello
2023-Nov-14 20:07 UTC
[Nouveau] [PATCH v3 4/7] PCI: pciehp: Move check for is_thunderbolt into a quirk
commit 493fb50e958c ("PCI: pciehp: Assume NoCompl+ for Thunderbolt ports") added a check into pciehp code to explicitly set NoCompl+ for all Intel Thunderbolt controllers, including those that don't need it. This overloaded the purpose of the `is_thunderbolt` member of `struct pci_device` because that means that any controller that identifies as thunderbolt would set NoCompl+ even if it doesn't suffer this deficiency. As that commit helpfully specifies all the controllers with the problem, move them into a PCI quirk. Signed-off-by: Mario Limonciello <mario.limonciello at amd.com> --- v2->v3: * Reword commit message * Update comments --- drivers/pci/hotplug/pciehp_hpc.c | 6 +----- drivers/pci/quirks.c | 20 ++++++++++++++++++++ include/linux/pci.h | 1 + 3 files changed, 22 insertions(+), 5 deletions(-) diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c index b1d0a1b3917d..40f7a26fb98f 100644 --- a/drivers/pci/hotplug/pciehp_hpc.c +++ b/drivers/pci/hotplug/pciehp_hpc.c @@ -992,11 +992,7 @@ struct controller *pcie_init(struct pcie_device *dev) if (pdev->hotplug_user_indicators) slot_cap &= ~(PCI_EXP_SLTCAP_AIP | PCI_EXP_SLTCAP_PIP); - /* - * We assume no Thunderbolt controllers support Command Complete events, - * but some controllers falsely claim they do. - */ - if (pdev->is_thunderbolt) + if (pdev->no_command_complete) slot_cap |= PCI_EXP_SLTCAP_NCCS; ctrl->slot_cap = slot_cap; diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index ea476252280a..fa9b82cd7b3b 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3809,6 +3809,26 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PORT_RIDGE, quirk_thunderbolt_hotplug_msi); +/* + * Certain Thunderbolt 1 controllers falsely claim to support Command + * Completed events. + */ +static void quirk_thunderbolt_command_complete(struct pci_dev *pdev) +{ + pdev->no_command_complete = 1; +} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_LIGHT_RIDGE, + quirk_thunderbolt_command_complete); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_EAGLE_RIDGE, + quirk_thunderbolt_command_complete); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_LIGHT_PEAK, + quirk_thunderbolt_command_complete); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_4C, + quirk_thunderbolt_command_complete); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_CACTUS_RIDGE_2C, + quirk_thunderbolt_command_complete); +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_PORT_RIDGE, + quirk_thunderbolt_command_complete); #ifdef CONFIG_ACPI /* * Apple: Shutdown Cactus Ridge Thunderbolt controller. diff --git a/include/linux/pci.h b/include/linux/pci.h index 1fbca2bd92e8..20a6e4fc3060 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -441,6 +441,7 @@ struct pci_dev { unsigned int is_hotplug_bridge:1; unsigned int shpc_managed:1; /* SHPC owned by shpchp */ unsigned int is_thunderbolt:1; /* Thunderbolt controller */ + unsigned int no_command_complete:1; /* No command completion */ /* * Devices marked being untrusted are the ones that can potentially * execute DMA attacks and similar. They are typically connected -- 2.34.1
Mario Limonciello
2023-Nov-14 20:07 UTC
[Nouveau] [PATCH v3 5/7] PCI: ACPI: Detect PCIe root ports that are used for tunneling
USB4 routers support a feature called "PCIe tunneling". This allows PCIe traffic to be transmitted over USB4 fabric. PCIe root ports that are used in this fashion can be discovered by device specific data that specifies the USB4 router they are connected to. For the PCI core, the specific connection information doesn't matter, but it's interesting to know that this root port is used for tunneling traffic. This will allow other decisions to be made based upon it. Detect the `usb4-host-interface` _DSD and if it's found save it into a new `is_virtual_link` bit in `struct pci_device`. Link: https://www.usb.org/document-library/usb4r-specification-v20 USB4 V2 with Errata and ECN through June 2023 Section 2.2.10.3 Link: https://learn.microsoft.com/en-us/windows-hardware/design/component-guidelines/usb4-acpi-requirements#port-mapping-_dsd-for-usb-3x-and-pcie Signed-off-by: Mario Limonciello <mario.limonciello at amd.com> --- v2->v3: * Use is_virtual_link to be future proof to other types of virtual links. * Update commit message --- drivers/pci/pci-acpi.c | 16 ++++++++++++++++ include/linux/pci.h | 1 + 2 files changed, 17 insertions(+) diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c index 004575091596..4a94d2fd8fb9 100644 --- a/drivers/pci/pci-acpi.c +++ b/drivers/pci/pci-acpi.c @@ -1415,12 +1415,28 @@ static void pci_acpi_set_external_facing(struct pci_dev *dev) dev->external_facing = 1; } +static void pci_acpi_set_virtual_link(struct pci_dev *dev) +{ + if (!pci_is_pcie(dev)) + return; + + switch (pci_pcie_type(dev)) { + case PCI_EXP_TYPE_ROOT_PORT: + case PCI_EXP_TYPE_DOWNSTREAM: + dev->is_virtual_link = device_property_present(&dev->dev, "usb4-host-interface"); + break; + default: + return; + } +} + void pci_acpi_setup(struct device *dev, struct acpi_device *adev) { struct pci_dev *pci_dev = to_pci_dev(dev); pci_acpi_optimize_delay(pci_dev, adev->handle); pci_acpi_set_external_facing(pci_dev); + pci_acpi_set_virtual_link(pci_dev); pci_acpi_add_edr_notifier(pci_dev); pci_acpi_add_pm_notifier(adev, pci_dev); diff --git a/include/linux/pci.h b/include/linux/pci.h index 20a6e4fc3060..83fbd0d7040e 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -441,6 +441,7 @@ struct pci_dev { unsigned int is_hotplug_bridge:1; unsigned int shpc_managed:1; /* SHPC owned by shpchp */ unsigned int is_thunderbolt:1; /* Thunderbolt controller */ + unsigned int is_virtual_link:1; /* Tunneled (virtual) PCIe link */ unsigned int no_command_complete:1; /* No command completion */ /* * Devices marked being untrusted are the ones that can potentially -- 2.34.1
Mario Limonciello
2023-Nov-14 20:07 UTC
[Nouveau] [PATCH v3 6/7] PCI: Split up some logic in pcie_bandwidth_available() to separate function
The logic to calculate bandwidth limits may be used at multiple call sites so split it up into its own static function instead. No intended functional changes. Suggested-by: Ilpo J?rvinen <ilpo.jarvinen at linux.intel.com> Signed-off-by: Mario Limonciello <mario.limonciello at amd.com> --- v2->v3: * Split from previous patch version --- drivers/pci/pci.c | 60 +++++++++++++++++++++++++++-------------------- 1 file changed, 34 insertions(+), 26 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 55bc3576a985..0ff7883cc774 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -6224,6 +6224,38 @@ int pcie_set_mps(struct pci_dev *dev, int mps) } EXPORT_SYMBOL(pcie_set_mps); +static u32 pcie_calc_bw_limits(struct pci_dev *dev, u32 bw, + struct pci_dev **limiting_dev, + enum pci_bus_speed *speed, + enum pcie_link_width *width) +{ + enum pcie_link_width next_width; + enum pci_bus_speed next_speed; + u32 next_bw; + u16 lnksta; + + pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); + + next_speed = pcie_link_speed[FIELD_GET(PCI_EXP_LNKSTA_CLS, lnksta)]; + next_width = FIELD_GET(PCI_EXP_LNKSTA_NLW, lnksta); + + next_bw = next_width * PCIE_SPEED2MBS_ENC(next_speed); + + /* Check if current device limits the total bandwidth */ + if (!bw || next_bw <= bw) { + bw = next_bw; + + if (limiting_dev) + *limiting_dev = dev; + if (speed) + *speed = next_speed; + if (width) + *width = next_width; + } + + return bw; +} + /** * pcie_bandwidth_available - determine minimum link settings of a PCIe * device and its bandwidth limitation @@ -6242,39 +6274,15 @@ u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev, enum pci_bus_speed *speed, enum pcie_link_width *width) { - u16 lnksta; - enum pci_bus_speed next_speed; - enum pcie_link_width next_width; - u32 bw, next_bw; + u32 bw = 0; if (speed) *speed = PCI_SPEED_UNKNOWN; if (width) *width = PCIE_LNK_WIDTH_UNKNOWN; - bw = 0; - while (dev) { - pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta); - - next_speed = pcie_link_speed[FIELD_GET(PCI_EXP_LNKSTA_CLS, - lnksta)]; - next_width = FIELD_GET(PCI_EXP_LNKSTA_NLW, lnksta); - - next_bw = next_width * PCIE_SPEED2MBS_ENC(next_speed); - - /* Check if current device limits the total bandwidth */ - if (!bw || next_bw <= bw) { - bw = next_bw; - - if (limiting_dev) - *limiting_dev = dev; - if (speed) - *speed = next_speed; - if (width) - *width = next_width; - } - + bw = pcie_calc_bw_limits(dev, bw, limiting_dev, speed, width); dev = pci_upstream_bridge(dev); } -- 2.34.1
Mario Limonciello
2023-Nov-14 20:07 UTC
[Nouveau] [PATCH v3 7/7] PCI: Exclude PCIe ports used for virtual links in pcie_bandwidth_available()
The USB4 spec specifies that PCIe ports that are used for tunneling PCIe traffic over USB4 fabric will be hardcoded to advertise 2.5GT/s and behave as a PCIe Gen1 device. The actual performance of these ports is controlled by the fabric implementation. Callers for pcie_bandwidth_available() will always find the PCIe ports used for tunneling as a limiting factor potentially leading to incorrect performance decisions. To prevent such problems check explicitly for ports that are marked as virtual links or as thunderbolt controllers and skip them when looking for bandwidth limitations of the hierarchy. If the only device connected is a port used for tunneling then report that device. Callers to pcie_bandwidth_available() could make this change on their own as well but then they wouldn't be able to detect other potential speed bottlenecks from the hierarchy without duplicating pcie_bandwidth_available() logic. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925#note_2145860 Link: https://www.usb.org/document-library/usb4r-specification-v20 USB4 V2 with Errata and ECN through June 2023 Section 11.2.1 Signed-off-by: Mario Limonciello <mario.limonciello at amd.com> --- v2->v3: * Split from previous patch version * Look for thunderbolt or virtual link --- drivers/pci/pci.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 0ff7883cc774..b1fb2258b211 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -6269,11 +6269,20 @@ static u32 pcie_calc_bw_limits(struct pci_dev *dev, u32 bw, * limiting_dev, speed, and width pointers are supplied) information about * that point. The bandwidth returned is in Mb/s, i.e., megabits/second of * raw bandwidth. + * + * This excludes the bandwidth calculation that has been returned from a + * PCIe device that is used for transmitting tunneled PCIe traffic over a virtual + * link part of larger hierarchy. Examples include Thunderbolt3 and USB4 links. + * The calculation is excluded because the USB4 specification specifies that the + * max speed returned from PCIe configuration registers for the tunneling link is + * always PCI 1x 2.5 GT/s. When only tunneled devices are present, the bandwidth + * returned is the bandwidth available from the first tunneled device. */ u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev, enum pci_bus_speed *speed, enum pcie_link_width *width) { + struct pci_dev *vdev = NULL; u32 bw = 0; if (speed) @@ -6282,10 +6291,20 @@ u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev, *width = PCIE_LNK_WIDTH_UNKNOWN; while (dev) { + if (dev->is_virtual_link || dev->is_thunderbolt) { + if (!vdev) + vdev = dev; + goto skip; + } bw = pcie_calc_bw_limits(dev, bw, limiting_dev, speed, width); +skip: dev = pci_upstream_bridge(dev); } + /* If nothing "faster" found on hierarchy, limit to first virtual link */ + if (vdev && !bw) + bw = pcie_calc_bw_limits(vdev, bw, limiting_dev, speed, width); + return bw; } EXPORT_SYMBOL(pcie_bandwidth_available); -- 2.34.1
Possibly Parallel Threads
- [PATCH 0/5] Thunderbolt GPU fixes
- [PATCH 1/5] PCI: Recognize Thunderbolt devices
- [PATCH 1/5] PCI: Recognize Thunderbolt devices
- [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"
- [PATCH 1/7] Revert "ACPI / OSI: Add OEM _OSI string to enable dGPU direct output"