All, just having spotted the backport of Linux commit 03bbcb2e I notice a certain discrepancy with the Xen commit having the same purpose as well as with the actual specification updates: The Linux solution keys off of device IDs 3403 and 3406, as listed in the specification update, but this way fails to cover the X58 chipset, which has - under different numbers (62 and 69) - the same errata (the Linux commit message also only mentions erratum 53 for the 55x0 chipsets, albeit I believe 47 is as much of an issue there as it is for Xen). The Xen solution keys off of ID 342e, with no explanation in the commit description on where this association comes from. I therefore wonder whether - the Xen solution may have false positives - the Linux solution would still have false negatives even after adding ID 3405 Jan
On 08/07/13 13:29, Jan Beulich wrote:> All, > > just having spotted the backport of Linux commit 03bbcb2e I notice > a certain discrepancy with the Xen commit having the same purpose > as well as with the actual specification updates: > > The Linux solution keys off of device IDs 3403 and 3406, as listed in > the specification update, but this way fails to cover the X58 chipset, > which has - under different numbers (62 and 69) - the same errata > (the Linux commit message also only mentions erratum 53 for the > 55x0 chipsets, albeit I believe 47 is as much of an issue there as it is > for Xen). > > The Xen solution keys off of ID 342e, with no explanation in the > commit description on where this association comes from. > > I therefore wonder whether > - the Xen solution may have false positives > - the Linux solution would still have false negatives even after > adding ID 3405 > > Jan >Hi Jan, I have used the VT-d control PCI device which is common across X58/5520/5500 See page 233 http://www.intel.co.uk/content/dam/doc/datasheet/x58-express-chipset-datasheet.pdf and page 255 http://www.intel.co.uk/content/dam/doc/datasheet/5520-5500-chipset-ioh-datasheet.pdf It seems the Linux workaround only applies to the 5520 and 5500 chipsets. The pci ID database suggests 0x342e will detect 7500 chipset (which does not have the interrupt remapping errata) but that chipset never reports revision 0x13 so the errata workaround will never be applied. Malcolm
On 08/07/13 13:45, Malcolm Crossley wrote:> On 08/07/13 13:29, Jan Beulich wrote: >> All, >> >> just having spotted the backport of Linux commit 03bbcb2e I notice >> a certain discrepancy with the Xen commit having the same purpose >> as well as with the actual specification updates: >> >> The Linux solution keys off of device IDs 3403 and 3406, as listed in >> the specification update, but this way fails to cover the X58 chipset, >> which has - under different numbers (62 and 69) - the same errata >> (the Linux commit message also only mentions erratum 53 for the >> 55x0 chipsets, albeit I believe 47 is as much of an issue there as it is >> for Xen). >> >> The Xen solution keys off of ID 342e, with no explanation in the >> commit description on where this association comes from. >> >> I therefore wonder whether >> - the Xen solution may have false positives >> - the Linux solution would still have false negatives even after >> adding ID 3405 >> >> Jan >> > Hi Jan, > > I have used the VT-d control PCI device which is common across > X58/5520/5500 > > See page 233 > http://www.intel.co.uk/content/dam/doc/datasheet/x58-express-chipset-datasheet.pdfWhoops got the page number wrong it''s page 223> and > > page 255 > http://www.intel.co.uk/content/dam/doc/datasheet/5520-5500-chipset-ioh-datasheet.pdf > > It seems the Linux workaround only applies to the 5520 and 5500 chipsets. > > The pci ID database suggests 0x342e will detect 7500 chipset (which > does not have the interrupt remapping errata) but that chipset never > reports revision 0x13 so the errata workaround will never be applied. >Also, if you check the devhide registers on page 318 of the 5520/5500 datasheet you can see the Device 0 (DMI port which the Linux uses for the PCI ID match) can be hidden from the PCI scan but device 20 function 0 can''t be hidden (what Xen use''s). You may find on a Dual IOH system that the DMI port for one of the IOH''s is disabled.> Malcolm > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
On Mon, Jul 08, 2013 at 01:29:43PM +0100, Jan Beulich wrote:> All, > > just having spotted the backport of Linux commit 03bbcb2e I notice > a certain discrepancy with the Xen commit having the same purpose > as well as with the actual specification updates: > > The Linux solution keys off of device IDs 3403 and 3406, as listed in > the specification update, but this way fails to cover the X58 chipset, > which has - under different numbers (62 and 69) - the same errata > (the Linux commit message also only mentions erratum 53 for the > 55x0 chipsets, albeit I believe 47 is as much of an issue there as it is > for Xen). > > The Xen solution keys off of ID 342e, with no explanation in the > commit description on where this association comes from. > > I therefore wonder whether > - the Xen solution may have false positives > - the Linux solution would still have false negatives even after > adding ID 3405 >Don Dutile noted that there was a problem he thought he noted with an X58 chipset that was simmilar to the one I fixed in the above commit. I''m unabel to track down an errata sheet for the X58 chipset, but if you have one and can forward it to me I think it would be best (assuming that it has the same errata that the 5500/5520 chipsets do), that we just add their pci device id''s in to the existing linux errata. To answer your above questions, I''m not sure if the xen solution may offer false positives, as I''ve not looked at the code. The Linux solution may offer false negatives, but only to the extent that this problem exists in chips not covered by any errata sheet that I''ve found to date. The errata sheet referencd in the commit I have was exhaustive in the pci id''s that it covered (which wsa limited to 3403 and 3406). If there are other chips that suffer from these errata, I would like to see an errata sheet that expressly identifies both the chip id''s and the problem. Thanks Neil> Jan > >
On Mon, Jul 08, 2013 at 01:45:50PM +0100, Malcolm Crossley wrote:> On 08/07/13 13:29, Jan Beulich wrote: > >All, > > > >just having spotted the backport of Linux commit 03bbcb2e I notice > >a certain discrepancy with the Xen commit having the same purpose > >as well as with the actual specification updates: > > > >The Linux solution keys off of device IDs 3403 and 3406, as listed in > >the specification update, but this way fails to cover the X58 chipset, > >which has - under different numbers (62 and 69) - the same errata > >(the Linux commit message also only mentions erratum 53 for the > >55x0 chipsets, albeit I believe 47 is as much of an issue there as it is > >for Xen). > > > >The Xen solution keys off of ID 342e, with no explanation in the > >commit description on where this association comes from. > > > >I therefore wonder whether > >- the Xen solution may have false positives > >- the Linux solution would still have false negatives even after > > adding ID 3405 > > > >Jan > > > Hi Jan, > > I have used the VT-d control PCI device which is common across X58/5520/5500 > > See page 233 http://www.intel.co.uk/content/dam/doc/datasheet/x58-express-chipset-datasheet.pdf > > and > > page 255 http://www.intel.co.uk/content/dam/doc/datasheet/5520-5500-chipset-ioh-datasheet.pdf > > It seems the Linux workaround only applies to the 5520 and 5500 chipsets. > > The pci ID database suggests 0x342e will detect 7500 chipset (which > does not have the interrupt remapping errata) but that chipset never > reports revision 0x13 so the errata workaround will never be > applied. >So, I''m a bit confused here, does the X58 chipset have the interrupt remapping errata or not? If it does, we should add it to the Linux Quirk, but I''d love to see some documentation to this effect, so that I can identify under what conditions the quirk should be applied. Thanks! Neil> Malcolm > >
>>> On 08.07.13 at 15:33, Neil Horman <nhorman@tuxdriver.com> wrote: > On Mon, Jul 08, 2013 at 01:29:43PM +0100, Jan Beulich wrote: >> All, >> >> just having spotted the backport of Linux commit 03bbcb2e I notice >> a certain discrepancy with the Xen commit having the same purpose >> as well as with the actual specification updates: >> >> The Linux solution keys off of device IDs 3403 and 3406, as listed in >> the specification update, but this way fails to cover the X58 chipset, >> which has - under different numbers (62 and 69) - the same errata >> (the Linux commit message also only mentions erratum 53 for the >> 55x0 chipsets, albeit I believe 47 is as much of an issue there as it is >> for Xen). >> >> The Xen solution keys off of ID 342e, with no explanation in the >> commit description on where this association comes from. >> >> I therefore wonder whether >> - the Xen solution may have false positives >> - the Linux solution would still have false negatives even after >> adding ID 3405 >> > Don Dutile noted that there was a problem he thought he noted with an X58 > chipset that was simmilar to the one I fixed in the above commit. I''m > unabel to > track down an errata sheet for the X58 chipset, but if you have one and can > forward it to me I think it would be best (assuming that it has the same > errata > that the 5500/5520 chipsets do), that we just add their pci device id''s in > to the existing linux errata.Attached. I don''t have a weblink at hand, I''m sending a copy cached on my local FS.> To answer your above questions, I''m not sure if the xen solution may offer > false > positives, as I''ve not looked at the code. The Linux solution may offer > false > negatives, but only to the extent that this problem exists in chips not > covered > by any errata sheet that I''ve found to date.Yeah, the hidden question behind that really was - do we have a way to ensure no false negatives and no false positives? I.e. some consolidated set of errata listing not just an individual chipset''s revisions that are affected, but all chipsets with all their revisions. I already proded my Intel contacts for this, but am of little hope that I''ll get back a positive answer. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On Mon, Jul 08, 2013 at 02:42:55PM +0100, Jan Beulich wrote:> >>> On 08.07.13 at 15:33, Neil Horman <nhorman@tuxdriver.com> wrote: > > On Mon, Jul 08, 2013 at 01:29:43PM +0100, Jan Beulich wrote: > >> All, > >> > >> just having spotted the backport of Linux commit 03bbcb2e I notice > >> a certain discrepancy with the Xen commit having the same purpose > >> as well as with the actual specification updates: > >> > >> The Linux solution keys off of device IDs 3403 and 3406, as listed in > >> the specification update, but this way fails to cover the X58 chipset, > >> which has - under different numbers (62 and 69) - the same errata > >> (the Linux commit message also only mentions erratum 53 for the > >> 55x0 chipsets, albeit I believe 47 is as much of an issue there as it is > >> for Xen). > >> > >> The Xen solution keys off of ID 342e, with no explanation in the > >> commit description on where this association comes from. > >> > >> I therefore wonder whether > >> - the Xen solution may have false positives > >> - the Linux solution would still have false negatives even after > >> adding ID 3405 > >> > > Don Dutile noted that there was a problem he thought he noted with an X58 > > chipset that was simmilar to the one I fixed in the above commit. I''m > > unabel to > > track down an errata sheet for the X58 chipset, but if you have one and can > > forward it to me I think it would be best (assuming that it has the same > > errata > > that the 5500/5520 chipsets do), that we just add their pci device id''s in > > to the existing linux errata. > > Attached. I don''t have a weblink at hand, I''m sending a copy cached > on my local FS. > > > To answer your above questions, I''m not sure if the xen solution may offer > > false > > positives, as I''ve not looked at the code. The Linux solution may offer > > false > > negatives, but only to the extent that this problem exists in chips not > > covered > > by any errata sheet that I''ve found to date. > > Yeah, the hidden question behind that really was - do we have a > way to ensure no false negatives and no false positives? I.e. some > consolidated set of errata listing not just an individual chipset''s > revisions that are affected, but all chipsets with all their revisions. > I already proded my Intel contacts for this, but am of little hope > that I''ll get back a positive answer. >Its a limitation of the organizational system. Errata sheets are filed per-device, so you have to look at each devices sheet to know if its affected, and its on us to add devices to the appropriate quirks manually. I wouldn''t hold out for Intel on this either. They can help if you point to a specific device and errata item, but they don''t have a database thats searachble by errata text super easily. Neil> Jan >
On Mon, Jul 08, 2013 at 02:42:55PM +0100, Jan Beulich wrote:> >>> On 08.07.13 at 15:33, Neil Horman <nhorman@tuxdriver.com> wrote: > > On Mon, Jul 08, 2013 at 01:29:43PM +0100, Jan Beulich wrote: > >> All, > >> > >> just having spotted the backport of Linux commit 03bbcb2e I notice > >> a certain discrepancy with the Xen commit having the same purpose > >> as well as with the actual specification updates: > >> > >> The Linux solution keys off of device IDs 3403 and 3406, as listed in > >> the specification update, but this way fails to cover the X58 chipset, > >> which has - under different numbers (62 and 69) - the same errata > >> (the Linux commit message also only mentions erratum 53 for the > >> 55x0 chipsets, albeit I believe 47 is as much of an issue there as it is > >> for Xen). > >> > >> The Xen solution keys off of ID 342e, with no explanation in the > >> commit description on where this association comes from. > >> > >> I therefore wonder whether > >> - the Xen solution may have false positives > >> - the Linux solution would still have false negatives even after > >> adding ID 3405 > >> > > Don Dutile noted that there was a problem he thought he noted with an X58 > > chipset that was simmilar to the one I fixed in the above commit. I''m > > unabel to > > track down an errata sheet for the X58 chipset, but if you have one and can > > forward it to me I think it would be best (assuming that it has the same > > errata > > that the 5500/5520 chipsets do), that we just add their pci device id''s in > > to the existing linux errata. > > Attached. I don''t have a weblink at hand, I''m sending a copy cached > on my local FS. >Thank you I''ll see if I can find an online version to point others to, but regardless, errata 61 seems to be the kicker here. Looks like we need to add and new pci id to the quirk, and check for some additional revisions if that one is seen. I''ll get to it shortly. Thanks for sending this out. Lets hope this problem doesn''t exist in even more chipsets. Neil> > To answer your above questions, I''m not sure if the xen solution may offer > > false > > positives, as I''ve not looked at the code. The Linux solution may offer > > false > > negatives, but only to the extent that this problem exists in chips not > > covered > > by any errata sheet that I''ve found to date. > > Yeah, the hidden question behind that really was - do we have a > way to ensure no false negatives and no false positives? I.e. some > consolidated set of errata listing not just an individual chipset''s > revisions that are affected, but all chipsets with all their revisions. > I already proded my Intel contacts for this, but am of little hope > that I''ll get back a positive answer. > > Jan >
Neil Horman
2013-Jul-08 19:13 UTC
[RFC PATCH] iommu/vt-d: Expand interrupt remapping quirk to cover x58 chipset
Recently we added an early quirk to detect 5500/5520 chipsets with early revisions that had problems with irq draining with interrupt remapping enabled. It turns out this same problem is present in the intel X58 chipset as well. See errata 69 here: http://www.intel.com/content/www/us/en/chipsets/x58-express-specification-update.html This patch extends the pci early quirk so that the chip devices/revisions specified in the above update are also covered in the same way: Signed-off-by: Neil Horman <nhorman@tuxdriver.com> --- arch/x86/kernel/early-quirks.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/early-quirks.c b/arch/x86/kernel/early-quirks.c index 94ab6b9..f1997b4 100644 --- a/arch/x86/kernel/early-quirks.c +++ b/arch/x86/kernel/early-quirks.c @@ -196,14 +196,21 @@ static void __init ati_bugs_contd(int num, int slot, int func) static void __init intel_remapping_check(int num, int slot, int func) { u8 revision; + u16 device; + device = read_pci_config_16(num, slot, func, PCI_DEVICE_ID); revision = read_pci_config_byte(num, slot, func, PCI_REVISION_ID); /* * Revision 0x13 of this chipset supports irq remapping * but has an erratum that breaks its behavior, flag it as such */ - if (revision == 0x13) + if ((device == 3405) && + ((revision == 0x12) || + (revision == 0x13) || + (revision == 0x22))) { + set_irq_remapping_broken(); + } else if (revision == 0x13) set_irq_remapping_broken(); } @@ -239,8 +246,11 @@ static struct chipset early_qrk[] __initdata = { PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd }, { PCI_VENDOR_ID_INTEL, 0x3403, PCI_CLASS_BRIDGE_HOST, PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check }, + { PCI_VENDOR_ID_INTEL, 0x3405, PCI_CLASS_BRIDGE_HOST, + PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check }, { PCI_VENDOR_ID_INTEL, 0x3406, PCI_CLASS_BRIDGE_HOST, PCI_BASE_CLASS_BRIDGE, 0, intel_remapping_check }, + {} }; -- 1.8.1.4
>>> On 08.07.13 at 19:11, Neil Horman <nhorman@tuxdriver.com> wrote: > Thank you I''ll see if I can find an online version to point others to, but > regardless, errata 61 seems to be the kicker here. Looks like we need to add > and new pci id to the quirk, and check for some additional revisions if that one > is seen. I''ll get to it shortly. Thanks for sending this out. Lets hope this > problem doesn''t exist in even more chipsets.Erratum 61? That doesn''t require remapping to be disabled as workaround. 62 and 69 do. But you make a very good point regarding revisions: 62 also applies to rev 0x12, and 69 applies to both 0x12 and 0x22. Xen doesn''t check for either of them right now. Malcolm - this in turn means that keying off of device ID 0x342e indeed isn''t correct/complete. Are you going to be able to submit a Xen side fix relatively soon, or shall I take care of this? Jan
Jan Beulich
2013-Jul-09 06:44 UTC
Re: [RFC PATCH] iommu/vt-d: Expand interrupt remapping quirk to cover x58 chipset
>>> On 08.07.13 at 21:13, Neil Horman <nhorman@tuxdriver.com> wrote: > --- a/arch/x86/kernel/early-quirks.c > +++ b/arch/x86/kernel/early-quirks.c > @@ -196,14 +196,21 @@ static void __init ati_bugs_contd(int num, int slot, int func) > static void __init intel_remapping_check(int num, int slot, int func) > { > u8 revision; > + u16 device; > > + device = read_pci_config_16(num, slot, func, PCI_DEVICE_ID); > revision = read_pci_config_byte(num, slot, func, PCI_REVISION_ID); > > /* > * Revision 0x13 of this chipset supports irq remapping > * but has an erratum that breaks its behavior, flag it as such > */ > - if (revision == 0x13) > + if ((device == 3405) &&0x3405. Also, I''d do the revision == 0x13 check first (uniformly for all matched IDs), but I admit that this is a matter of taste. So consider this Reviewed-by: Jan Beulich <jbeulich@suse.com> with the above mistake fixed. Jan> + ((revision == 0x12) || > + (revision == 0x13) || > + (revision == 0x22))) { > + set_irq_remapping_broken(); > + } else if (revision == 0x13) > set_irq_remapping_broken(); > > }
On 09/07/13 07:40, Jan Beulich wrote:>>>> On 08.07.13 at 19:11, Neil Horman <nhorman@tuxdriver.com> wrote: >> Thank you I''ll see if I can find an online version to point others to, but >> regardless, errata 61 seems to be the kicker here. Looks like we need to add >> and new pci id to the quirk, and check for some additional revisions if that one >> is seen. I''ll get to it shortly. Thanks for sending this out. Lets hope this >> problem doesn''t exist in even more chipsets. > Erratum 61? That doesn''t require remapping to be disabled as > workaround. 62 and 69 do. > > But you make a very good point regarding revisions: 62 also > applies to rev 0x12, and 69 applies to both 0x12 and 0x22. > Xen doesn''t check for either of them right now. Malcolm - this > in turn means that keying off of device ID 0x342e indeed isn''t > correct/complete. Are you going to be able to submit a Xen > side fix relatively soon, or shall I take care of this?I didn''t notice that errata 69 is not listed as fixed on rev 0x22 for the X58 chipset, sorry about that. Have we had confirmation from Intel about this particular errata affecting the X58? It seems strange that they both have the same revision ID but different fixes for errata. If we have got confirmation then I''ll create a patch straight away, I''ll try our Intel contacts as well, it might speed things up a bit. Malcolm> Jan >
>>> On 09.07.13 at 11:34, Malcolm Crossley <malcolm.crossley@citrix.com> wrote: > Have we had confirmation from Intel about this particular errata affecting > the X58? It seems strange that they both have the same revision ID but > different fixes for errata. If we have got confirmation then I''ll create > a patch straight away, I''ll try our Intel contacts as well, it might speed > things up a bit.No, there was no confirmation so far. Jan