Malcolm Crossley
2013-Mar-01 13:46 UTC
[PATCH v2] VTD/Intremap: Disable Intremap on Chipset 5500/5520/X58 due to errata
http://www.intel.com/content/www/us/en/chipsets/5520-and-5500-chipset-ioh-specification-update.html Stepping B-3 has two errata (#47 and #53) related to Interrupt remapping, to which the workaround is for the BIOS to completely disable interrupt remapping. These errata are fixed in stepping C-2. Unfortunately this chipset stepping is very common and many BIOSes are not disabling interrupt remapping on this stepping . We can detect this in Xen and prevent Xen from using the problematic interrupt remapping feature. The Intel 5500/5520/X58 chipset does not support VT-d Extended Interrupt Mode(EIM). This means the iommu_supports_eim() check always fails and so x2apic mode cannot be enabled in Xen before this quirk disables the interrupt remapping feature. Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> diff -r 94ece33caae2 -r 8f4d9b330e19 xen/drivers/passthrough/vtd/quirks.c --- a/xen/drivers/passthrough/vtd/quirks.c +++ b/xen/drivers/passthrough/vtd/quirks.c @@ -244,6 +244,29 @@ void vtd_ops_postamble_quirk(struct iomm } } +/* 5500/5520/X58 Chipset Interrupt remapping errata, for stepping B-3. + * Fixed in stepping C-2. */ +void __init tylersburg_intremap_quirk(void) +{ + uint32_t bus, device; + uint8_t rev; + + for ( bus = 0; bus < 0x100; bus++ ) + { + /* Match on System Management Registers on Device 20 Function 0 */ + device = pci_conf_read32(0, bus, 20, 0, PCI_VENDOR_ID); + rev = pci_conf_read8(0, bus, 20, 0, PCI_REVISION_ID); + + if ( rev == 0x13 && device == 0x342e8086 ) + { + dprintk(XENLOG_INFO VTDPREFIX, + "Disabling Interrupt Remapping due to Intel 5500/5520/X58 Chipset errata #47, #53\n"); + iommu_intremap = 0; + break; + } + } +} + /* initialize platform identification flags */ void __init platform_quirks_init(void) { @@ -264,6 +287,9 @@ void __init platform_quirks_init(void) /* ioremap IGD MMIO+0x2000 page */ map_igd_reg(); + + /* Tylersburg interrupt remap quirk */ + tylersburg_intremap_quirk(); } /*
Jan Beulich
2013-Mar-05 07:45 UTC
Re: [PATCH v2] VTD/Intremap: Disable Intremap on Chipset 5500/5520/X58 due to errata
>>> On 01.03.13 at 14:46, Malcolm Crossley <malcolm.crossley@citrix.com> wrote: > http://www.intel.com/content/www/us/en/chipsets/5520-and-5500-chipset-ioh-specific > ation-update.html > > Stepping B-3 has two errata (#47 and #53) related to Interrupt > remapping, to which the workaround is for the BIOS to completely disable > interrupt remapping. These errata are fixed in stepping C-2. > > Unfortunately this chipset stepping is very common and many BIOSes are > not disabling interrupt remapping on this stepping . We can detect this in > Xen and prevent Xen from using the problematic interrupt remapping feature. > > The Intel 5500/5520/X58 chipset does not support VT-d > Extended Interrupt Mode(EIM). This means the iommu_supports_eim() check > always fails and so x2apic mode cannot be enabled in Xen before this quirk > disables the interrupt remapping feature.Just like for the first version of this patch (where you didn''t really follow up on all the comment made), and in line with the XSA-36 follow up that you also asked upon some time last week, I don''t think we can take this as is. First and foremost we need to settle on a policy: - fully disable IOMMU - stay with allowing PV passthrough in this mode - also suppress PV passthrough in this mode - partially disable IOMMU in presence of platform errata (yielding an unexpectedly - to the user - insecure system) - other options? All that including ways to override the behavior towards lower security (but perhaps not towards erroneous behavior, i.e. in the case here we probably shouldn''t permit interrupt remapping to be forcibly enabled). This specifically needs to be distinguished from firmware disabling/hiding some functionality - we ought to be permitted to expect the operator to know the security level that the platform provides. My personal opinion is that in the event where we need to disable _any_ IOMMU functionality, we ought to _fully_ suppress passthrough (and hence we can as well disable the IOMMU altogether, as we did for XSA-36). We can _then_ allow the user to re-enable the IOMMU (in the case here as much as for XSA-36 e.g. via "iommu=no-intremap", i.e. the operator explicitly declaring to be willing to take the risk). That would require parts of the XSA-36 follow up patch that I had posted (other parts of it would need adjustment). Only then can we, consistently for AMD Vi and VT-d, implement workarounds like the one here. Besides that, please don''t use dprintk() except for truly debugging messages - the file/line prefixing is only producing extra noise when the message itself is unique. And - do you really need to iterate over all buses on segment 0? The X58 data sheet says at the top of section 17.1: "All devices on the IOH reside on bus 0". I wonder whether you wouldn''t instead need to do this over all segments, on each bus 0. Jan
Andrew Cooper
2013-Mar-05 11:59 UTC
Re: [PATCH v2] VTD/Intremap: Disable Intremap on Chipset 5500/5520/X58 due to errata
On 05/03/13 07:45, Jan Beulich wrote:>>>> On 01.03.13 at 14:46, Malcolm Crossley <malcolm.crossley@citrix.com> wrote: >> http://www.intel.com/content/www/us/en/chipsets/5520-and-5500-chipset-ioh-specific >> ation-update.html >> >> Stepping B-3 has two errata (#47 and #53) related to Interrupt >> remapping, to which the workaround is for the BIOS to completely disable >> interrupt remapping. These errata are fixed in stepping C-2. >> >> Unfortunately this chipset stepping is very common and many BIOSes are >> not disabling interrupt remapping on this stepping . We can detect this in >> Xen and prevent Xen from using the problematic interrupt remapping feature. >> >> The Intel 5500/5520/X58 chipset does not support VT-d >> Extended Interrupt Mode(EIM). This means the iommu_supports_eim() check >> always fails and so x2apic mode cannot be enabled in Xen before this quirk >> disables the interrupt remapping feature. > Just like for the first version of this patch (where you didn''t really > follow up on all the comment made), and in line with the XSA-36 > follow up that you also asked upon some time last week, I don''t > think we can take this as is. First and foremost we need to settle > on a policy: > > - fully disable IOMMU > - stay with allowing PV passthrough in this mode > - also suppress PV passthrough in this mode > - partially disable IOMMU in presence of platform errata > (yielding an unexpectedly - to the user - insecure system) > - other options?(Replying on behalf of Malcolm who is out of the office this week) There is only one satisfactory option, and this is to provide a report to the toolstack of what hardware is available/missing/buggy. The toolstack can then make an informed decision as to whether to use passthrough or not. The specific usecase we have in mind going forward is with driver domains. Even without interrupt remapping, it is safe to pass devices through to trusted code base driver domains. A binary switch on the Xen command line is not acceptable nor appropriate in this situation.> > All that including ways to override the behavior towards lower > security (but perhaps not towards erroneous behavior, i.e. in the > case here we probably shouldn''t permit interrupt remapping to be > forcibly enabled). > > This specifically needs to be distinguished from firmware > disabling/hiding some functionality - we ought to be permitted > to expect the operator to know the security level that the > platform provides.Why? This patch has the *exactly the same effect* as a BIOS upgrade which follows the erratum recommendations. The difference being that it is at least tweakable on the Xen command line. The real problem with this patch is because of how common these chipsets are, and the fact that no BIOSes we are aware of have implemented the erratum workaround. This bug is the root cause of at least 7 customer escalations of weird crashes, and suspected cause of 5 more. Customers do not need to be actively using PCI passthrough to have this issue screw with dom0.> > My personal opinion is that in the event where we need to disable > _any_ IOMMU functionality, we ought to _fully_ suppress > passthrough (and hence we can as well disable the IOMMU > altogether, as we did for XSA-36). We can _then_ allow the user > to re-enable the IOMMU (in the case here as much as for XSA-36 > e.g. via "iommu=no-intremap", i.e. the operator explicitly > declaring to be willing to take the risk). That would require parts > of the XSA-36 follow up patch that I had posted (other parts of it > would need adjustment).See the previously device driver usercase. Also, this is unacceptable for any business usecase of Xen. I realize that this is our problem rather than upstreams, but Citrix is certainly not alone in this regard. We cannot push a bugfix or security which regresses functionality, which is why XSA-36 is not currently in an acceptable state. It is the same reasoning as your commit for c/s 25765:e6ca45ca03c2> > Only then can we, consistently for AMD Vi and VT-d, implement > workarounds like the one here. > > Besides that, please don''t use dprintk() except for truly > debugging messages - the file/line prefixing is only producing > extra noise when the message itself is unique. > > And - do you really need to iterate over all buses on segment 0? > The X58 data sheet says at the top of section 17.1: "All devices > on the IOH reside on bus 0". I wonder whether you wouldn''t > instead need to do this over all segments, on each bus 0. > > Jan >The chipsets do not support multi-segment systems, and we have a multi-socket affected systems with multiple of these chipsets, with none of the IOH''s on bus 0. ~Andrew
Jan Beulich
2013-Mar-05 13:44 UTC
Re: [PATCH v2] VTD/Intremap: Disable Intremap on Chipset 5500/5520/X58 due to errata
>>> On 05.03.13 at 12:59, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 05/03/13 07:45, Jan Beulich wrote: >> Just like for the first version of this patch (where you didn''t really >> follow up on all the comment made), and in line with the XSA-36 >> follow up that you also asked upon some time last week, I don''t >> think we can take this as is. First and foremost we need to settle >> on a policy: >> >> - fully disable IOMMU >> - stay with allowing PV passthrough in this mode >> - also suppress PV passthrough in this mode >> - partially disable IOMMU in presence of platform errata >> (yielding an unexpectedly - to the user - insecure system) >> - other options? > > There is only one satisfactory option, and this is to provide a report > to the toolstack of what hardware is available/missing/buggy. The > toolstack can then make an informed decision as to whether to use > passthrough or not.Fine with me, but I don''t think I''ve seen a patch to this effect.> The specific usecase we have in mind going forward is with driver > domains. Even without interrupt remapping, it is safe to pass devices > through to trusted code base driver domains. A binary switch on the Xen > command line is not acceptable nor appropriate in this situation.Makes sense, albeit I''d question whether "trusted code base" is enough here (because of bugs that may still affect the whole system).>> This specifically needs to be distinguished from firmware >> disabling/hiding some functionality - we ought to be permitted >> to expect the operator to know the security level that the >> platform provides. > > Why? This patch has the *exactly the same effect* as a BIOS upgrade > which follows the erratum recommendations.But this difference is - as said - that it''s still a firmware decision then, and my statement on the operator being expected to be aware holds. Plus the political aspect: We''re going to be held responsible (with "we" being xen.org or individual vendors) for the security breach if we disable a security relevant function of the hardware. Whereas when the BIOS does so, it''s the BIOS vendor''s liability.> The difference being that it > is at least tweakable on the Xen command line. The real problem with > this patch is because of how common these chipsets are, and the fact > that no BIOSes we are aware of have implemented the erratum workaround. > > This bug is the root cause of at least 7 customer escalations of weird > crashes, and suspected cause of 5 more. Customers do not need to be > actively using PCI passthrough to have this issue screw with dom0.And I didn''t say I don''t want a workaround. What I said is that the way it is being implemented is imo not suitable.>> My personal opinion is that in the event where we need to disable >> _any_ IOMMU functionality, we ought to _fully_ suppress >> passthrough (and hence we can as well disable the IOMMU >> altogether, as we did for XSA-36). We can _then_ allow the user >> to re-enable the IOMMU (in the case here as much as for XSA-36 >> e.g. via "iommu=no-intremap", i.e. the operator explicitly >> declaring to be willing to take the risk). That would require parts >> of the XSA-36 follow up patch that I had posted (other parts of it >> would need adjustment). > > See the previously device driver usercase. > > Also, this is unacceptable for any business usecase of Xen. I realize > that this is our problem rather than upstreams, but Citrix is certainly > not alone in this regard. We cannot push a bugfix or security which > regresses functionality, which is why XSA-36 is not currently in an > acceptable state. It is the same reasoning as your commit for c/s > 25765:e6ca45ca03c2Not really. The way that one is done is not sacrificing security for functionality, at least not by default. It merely allows an operator to decide to do so. Of course, here we''re talking about trading security for stability, and one would generally expect stability to come first, then security, then functionality. But then again that''s not the question here, as we''re not discussing whether we need a fix (workaround), but how to implement it. And there clearly are ways to implement the workaround in a secure fashion - see above.>> And - do you really need to iterate over all buses on segment 0? >> The X58 data sheet says at the top of section 17.1: "All devices >> on the IOH reside on bus 0". I wonder whether you wouldn''t >> instead need to do this over all segments, on each bus 0. > > The chipsets do not support multi-segment systems, and we have a > multi-socket affected systems with multiple of these chipsets, with none > of the IOH''s on bus 0.That''s contrary to the spec then, and will need clarification. Don, Xiantao? Jan
Zhang, Xiantao
2013-Mar-12 21:55 UTC
Re: [PATCH v2] VTD/Intremap: Disable Intremap on Chipset 5500/5520/X58 due to errata
> > >> And - do you really need to iterate over all buses on segment 0? > >> The X58 data sheet says at the top of section 17.1: "All devices > >> on the IOH reside on bus 0". I wonder whether you wouldn''t > >> instead need to do this over all segments, on each bus 0. > > > > The chipsets do not support multi-segment systems, and we have a > > multi-socket affected systems with multiple of these chipsets, with none > > of the IOH''s on bus 0. > > That''s contrary to the spec then, and will need clarification. > Don, Xiantao?Hi, Jan For x58, it is true because it is a UP platform, but it is not true for 5500/5520, it depends on how many IOHs are in the platforms, and also depends on how it is configured by platform, so the spec may only say x58''s IOH resides in bus0, but not for applicable for all. Xiantao
Jan Beulich
2013-Mar-13 08:48 UTC
Re: [PATCH v2] VTD/Intremap: Disable Intremap on Chipset 5500/5520/X58 due to errata
>>> On 12.03.13 at 22:55, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote: >> >> >> And - do you really need to iterate over all buses on segment 0? >> >> The X58 data sheet says at the top of section 17.1: "All devices >> >> on the IOH reside on bus 0". I wonder whether you wouldn''t >> >> instead need to do this over all segments, on each bus 0. >> > >> > The chipsets do not support multi-segment systems, and we have a >> > multi-socket affected systems with multiple of these chipsets, with none >> > of the IOH''s on bus 0. >> >> That''s contrary to the spec then, and will need clarification. >> Don, Xiantao? > > For x58, it is true because it is a UP platform, but it is not true for > 5500/5520, it depends on how many IOHs are in the platforms, and also depends > on how it is configured by platform, so the spec may only say x58''s IOH > resides in bus0, but not for applicable for all.In which case (considering that neither the most recent [and only] data sheet rev nor the spec update say so) the question is whether there''s any restriction which buses to look at, or whether the loop really needs to go over all buses on all segments (the latter of which is a problem on its own, as Xen may not be able to access the config space of devices on segments other than 0, as the approval to use MMCFG may need to come from Dom0). For Andrew''s statement above regarding these chipsets not supporting multi- segment setups I haven''t found confirmation anywhere, and (considering the potential of extra gluing hardware) may be hard to be had. Jan
Malcolm Crossley
2013-Mar-13 11:02 UTC
Re: [PATCH v2] VTD/Intremap: Disable Intremap on Chipset 5500/5520/X58 due to errata
On 13/03/13 08:48, Jan Beulich wrote:>>>> On 12.03.13 at 22:55, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote: >>> >>>>> And - do you really need to iterate over all buses on segment 0? >>>>> The X58 data sheet says at the top of section 17.1: "All devices >>>>> on the IOH reside on bus 0". I wonder whether you wouldn''t >>>>> instead need to do this over all segments, on each bus 0. >>>> The chipsets do not support multi-segment systems, and we have a >>>> multi-socket affected systems with multiple of these chipsets, with none >>>> of the IOH''s on bus 0. >>> That''s contrary to the spec then, and will need clarification. >>> Don, Xiantao? >> For x58, it is true because it is a UP platform, but it is not true for >> 5500/5520, it depends on how many IOHs are in the platforms, and also depends >> on how it is configured by platform, so the spec may only say x58''s IOH >> resides in bus0, but not for applicable for all. > In which case (considering that neither the most recent [and only] > data sheet rev nor the spec update say so) the question is whether > there''s any restriction which buses to look at, or whether the loop > really needs to go over all buses on all segments (the latter of > which is a problem on its own, as Xen may not be able to access the > config space of devices on segments other than 0, as the approval > to use MMCFG may need to come from Dom0). For Andrew''s > statement above regarding these chipsets not supporting multi- > segment setups I haven''t found confirmation anywhere, and > (considering the potential of extra gluing hardware) may be hard > to be had.Section 7.3.1 of the 5520-5500 datasheet describes how the PCI bus numbers are setup for each 55xx chipset. The datasheet confusing refers to segments but it is refering to subsets of the 256 buses in segment 0. This is confirmed by the Bus subset control registers LCFGBUS and GCFGBUS having no support for a segment ID and only support for Bus ranges. I have read the Intel 7500 series datasheet volume 2 section 4 and Intel 5500/7500 chipset series datasheet volume 2 section 7 and come to the understanding that PCI segments are implemented at a processor level and not at the chipset level. This means the processor needs an additional MMCONFIG address decoder in order know which address ranges go to which PCI segments. The Intel 7500 series has that additional MMCONFIG address decoder capability (Intel 7500 series datasheet volume 2 section 4.2.4.3 ) but the Intel 5500/5600 series processors only has 1 MMCONFIG decoder (SAD_PCIEXBAR). It would be good for Intel to confirm that Intel 5500/5600 series processors can''t support multiple PCI segments though. My impression is that the Intel 5500/5520/X58 series chipsets and Intel 5500/5600 series processors are targeted at two socket maximum and so don''t support large scale features like x2apic or multiple PCI segments. The Intel EX series processors like the 7500 series, E7-88xx series and Intel 7500 series chipset go to 8 sockets or more and so support x2apic and multiple PCI segments, luckily the Intel 7500 series chipset does not suffer from this errata. Regards Malcolm