Sander Eikelenboom
2013-Feb-05 21:19 UTC
Xen-unstable boot panic due to changeset 26517 AMD, IOMMU: Clean up old entries in remapping tables when creating new one
Hi Jan, Boot of xen-unstable is broken due to changeset 26517 on a AMD 890-FX motherboard. The serial log is attached. -- Sander _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Feb-06 10:23 UTC
Re: Xen-unstable boot panic due to changeset 26517 AMD, IOMMU: Clean up old entries in remapping tables when creating new one
>>> On 05.02.13 at 22:19, Sander Eikelenboom <linux@eikelenboom.it> wrote: > Boot of xen-unstable is broken due to changeset 26517 on a AMD 890-FX > motherboard. > The serial log is attached.Yeah, we were afraid of that. Unfortunately the log you provided, while very detailed, doesn''t really make clear to me what is going wrong. In particular, considering there is a NULL pointer there (which I can only guess is the new pin_setup pointer), I would have expected it to crash earlier. Hence I''m attaching a patch which closes a hole in the logic, but is unlikely to address your problem. The added debugging output should help, but please also make available the xen-syms image in case the patch - as expected - doesn''t help and that system of yours still crashes. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Sander Eikelenboom
2013-Feb-06 11:24 UTC
Re: Xen-unstable boot panic due to changeset 26517 AMD, IOMMU: Clean up old entries in remapping tables when creating new one
Wednesday, February 6, 2013, 11:23:29 AM, you wrote:>>>> On 05.02.13 at 22:19, Sander Eikelenboom <linux@eikelenboom.it> wrote: >> Boot of xen-unstable is broken due to changeset 26517 on a AMD 890-FX >> motherboard. >> The serial log is attached.> Yeah, we were afraid of that. Unfortunately the log you provided, > while very detailed, doesn''t really make clear to me what is going > wrong. In particular, considering there is a NULL pointer there > (which I can only guess is the new pin_setup pointer), I would > have expected it to crash earlier. Hence I''m attaching a patch > which closes a hole in the logic, but is unlikely to address your > problem. The added debugging output should help, but please > also make available the xen-syms image in case the patch - as > expected - doesn''t help and that system of yours still crashes.> JanHmm with the patch it does boot, but disables the I/O virtualization. Output of xl-dmesg attached, do you still need a xen-sums of the situation without the debug patch (where it does crash) ? -- Sander _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Feb-06 12:52 UTC
Re: Xen-unstable boot panic due to changeset 26517 AMD, IOMMU: Clean up old entries in remapping tables when creating new one
>>> On 06.02.13 at 12:24, Sander Eikelenboom <linux@eikelenboom.it> wrote: > Hmm with the patch it does boot, but disables the I/O virtualization.Good. While, as said before, I still don''t understand why it didn''t crash earlier without that patch, I''m glad it''s fixed. Will post the patch for inclusion momentarily.> Output of xl-dmesg attached, do you still need a xen-sums of the situation > without the debug patch (where it does crash) ?And you can''t expect much else with broken ACPI tables: (XEN) AMD-Vi: IVHD Device Entry: type 0x48 id 0 flags 0xd7 (XEN) AMD-Vi: IVHD Special: 0000:00:14.0 variety 0x2 handle 0 This is a HPET entry. (XEN) AMD-Vi: IVHD Device Entry: type 0x48 id 0 flags 0 (XEN) AMD-Vi: IVHD Special: 0000:00:00.1 variety 0x1 handle 0x7 And this is an entry for IO-APIC #2 (ID 7), whereas FADT says (XEN) ACPI: IOAPIC (id[0x06] address[0xfec00000] gsi_base[0]) (XEN) IOAPIC[0]: apic_id 6, version 33, address 0xfec00000, GSI 0-23 (XEN) ACPI: IOAPIC (id[0x07] address[0xfec20000] gsi_base[24]) (XEN) IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24-55 so the IOMMU table is lacking an entry for the first IO-APIC, and without that we can''t set up per-device interrupt remapping (in which case we choose to disable the IOMMU altogether, albeit it had been questioned whether that isn''t making a bad situation worse in some cases). If you want the IOMMU back (at the price of re-opening the security issue described in XSA-36), you''d have to pass "iommu=amd-iommu-perdev-intremap" to the hypervisor. Jan
Sander Eikelenboom
2013-Feb-06 13:39 UTC
Re: Xen-unstable boot panic due to changeset 26517 AMD, IOMMU: Clean up old entries in remapping tables when creating new one
Wednesday, February 6, 2013, 1:52:38 PM, you wrote:>>>> On 06.02.13 at 12:24, Sander Eikelenboom <linux@eikelenboom.it> wrote: >> Hmm with the patch it does boot, but disables the I/O virtualization.> Good. While, as said before, I still don''t understand why it didn''t > crash earlier without that patch, I''m glad it''s fixed. Will post the > patch for inclusion momentarily.>> Output of xl-dmesg attached, do you still need a xen-sums of the situation >> without the debug patch (where it does crash) ?> And you can''t expect much else with broken ACPI tables:Hmm yeah it seems anything that has to remotely depend on anything bios related is pretty doomed, on both intel and AMD. And support/willingness to correct things, is limited, not to say non-existent.> (XEN) AMD-Vi: IVHD Device Entry: type 0x48 id 0 flags 0xd7 > (XEN) AMD-Vi: IVHD Special: 0000:00:14.0 variety 0x2 handle 0> This is a HPET entry.> (XEN) AMD-Vi: IVHD Device Entry: type 0x48 id 0 flags 0 > (XEN) AMD-Vi: IVHD Special: 0000:00:00.1 variety 0x1 handle 0x7> And this is an entry for IO-APIC #2 (ID 7), whereas FADT says> (XEN) ACPI: IOAPIC (id[0x06] address[0xfec00000] gsi_base[0]) > (XEN) IOAPIC[0]: apic_id 6, version 33, address 0xfec00000, GSI 0-23 > (XEN) ACPI: IOAPIC (id[0x07] address[0xfec20000] gsi_base[24]) > (XEN) IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24-55> so the IOMMU table is lacking an entry for the first IO-APIC, and > without that we can''t set up per-device interrupt remapping (in > which case we choose to disable the IOMMU altogether, albeit it > had been questioned whether that isn''t making a bad situation > worse in some cases).> If you want the IOMMU back (at the price of re-opening the > security issue described in XSA-36), you''d have to pass > "iommu=amd-iommu-perdev-intremap" to the hypervisor.Will try some newer bios, although i tried that in the past and it resulted in a non-booting system. But perhaps things have changed for the better. Thanks so far ! -- Sander> Jan
Sander Eikelenboom
2013-Feb-06 21:34 UTC
Re: Xen-unstable boot panic due to changeset 26517 AMD, IOMMU: Clean up old entries in remapping tables when creating new one
Wednesday, February 6, 2013, 1:52:38 PM, you wrote:>>>> On 06.02.13 at 12:24, Sander Eikelenboom <linux@eikelenboom.it> wrote: >> Hmm with the patch it does boot, but disables the I/O virtualization.> Good. While, as said before, I still don''t understand why it didn''t > crash earlier without that patch, I''m glad it''s fixed. Will post the > patch for inclusion momentarily.>> Output of xl-dmesg attached, do you still need a xen-sums of the situation >> without the debug patch (where it does crash) ?> And you can''t expect much else with broken ACPI tables:> (XEN) AMD-Vi: IVHD Device Entry: type 0x48 id 0 flags 0xd7 > (XEN) AMD-Vi: IVHD Special: 0000:00:14.0 variety 0x2 handle 0> This is a HPET entry.> (XEN) AMD-Vi: IVHD Device Entry: type 0x48 id 0 flags 0 > (XEN) AMD-Vi: IVHD Special: 0000:00:00.1 variety 0x1 handle 0x7> And this is an entry for IO-APIC #2 (ID 7), whereas FADT says> (XEN) ACPI: IOAPIC (id[0x06] address[0xfec00000] gsi_base[0]) > (XEN) IOAPIC[0]: apic_id 6, version 33, address 0xfec00000, GSI 0-23 > (XEN) ACPI: IOAPIC (id[0x07] address[0xfec20000] gsi_base[24]) > (XEN) IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24-55> so the IOMMU table is lacking an entry for the first IO-APIC, and > without that we can''t set up per-device interrupt remapping (in > which case we choose to disable the IOMMU altogether, albeit it > had been questioned whether that isn''t making a bad situation > worse in some cases).> If you want the IOMMU back (at the price of re-opening the > security issue described in XSA-36), you''d have to pass > "iommu=amd-iommu-perdev-intremap" to the hypervisor.Just for the record (the list), that should be iommu=no-amd-iommu-perdev-intremap> Jan
Suravee Suthikulanit
2013-Feb-08 20:14 UTC
Re: Xen-unstable boot panic due to changeset 26517 AMD, IOMMU: Clean up old entries in remapping tables when creating new one
On 2/6/2013 6:52 AM, Jan Beulich wrote:>>>> On 06.02.13 at 12:24, Sander Eikelenboom <linux@eikelenboom.it> wrote: >> Hmm with the patch it does boot, but disables the I/O virtualization. > Good. While, as said before, I still don''t understand why it didn''t > crash earlier without that patch, I''m glad it''s fixed. Will post the > patch for inclusion momentarily. > >> Output of xl-dmesg attached, do you still need a xen-sums of the situation >> without the debug patch (where it does crash) ? > And you can''t expect much else with broken ACPI tables: > > (XEN) AMD-Vi: IVHD Device Entry: type 0x48 id 0 flags 0xd7 > (XEN) AMD-Vi: IVHD Special: 0000:00:14.0 variety 0x2 handle 0 > > This is a HPET entry. > > (XEN) AMD-Vi: IVHD Device Entry: type 0x48 id 0 flags 0 > (XEN) AMD-Vi: IVHD Special: 0000:00:00.1 variety 0x1 handle 0x7 > > And this is an entry for IO-APIC #2 (ID 7), whereas FADT says > > (XEN) ACPI: IOAPIC (id[0x06] address[0xfec00000] gsi_base[0]) > (XEN) IOAPIC[0]: apic_id 6, version 33, address 0xfec00000, GSI 0-23 > (XEN) ACPI: IOAPIC (id[0x07] address[0xfec20000] gsi_base[24]) > (XEN) IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24-55 > > so the IOMMU table is lacking an entry for the first IO-APIC, and > without that we can''t set up per-device interrupt remapping (in > which case we choose to disable the IOMMU altogether, albeit it > had been questioned whether that isn''t making a bad situation > worse in some cases). > > If you want the IOMMU back (at the price of re-opening the > security issue described in XSA-36), you''d have to pass > "iommu=amd-iommu-perdev-intremap" to the hypervisor. > > JanJan, It seems that all the recent issues with the AMD IOMMU regarding IOAPIC are mainly caused by mismatch information from IVRS and MADT. Xen sets up "nr_ioapics" by checking the number of IOAPICs reported in MADT, while the amd/iommu_acpi.c code uses information from the IVHD entries of the IVRS to initialize IOMMU. Most of the issues we are seeing are often triggered when platform BIOS decides to disable one of the two IOAPICs in the RD890s configuration. I am trying to summarize the cases here: CASE1: BIOS disable the IOAPIC in the southbridge (SB8X0 chipset) This is the case we are seeing here with the AMD 890-FX motherboard. Here, the MADT is reporting 2 IOAPICs as shown by the message: (XEN) ACPI: IOAPIC (id[0x06] address[0xfec00000] gsi_base[0]) (XEN) IOAPIC[0]: apic_id 6, version 33, address 0xfec00000, GSI 0-23 (XEN) ACPI: IOAPIC (id[0x07] address[0xfec20000] gsi_base[24]) (XEN) IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24-55 However, when Xen tries to setup the IOMMU interrupt remapping table using IVHD entries, there is only one IOAPIC (IOAPIC 1 with apic_id 7). (XEN) AMD-Vi: IVHD Device Entry: type 0x48 id 0 flags 0 (XEN) AMD-Vi: IVHD Special: 0000:00:00.1 variety 0x1 handle 0x7 (XEN) IVHD Error: no information for IO-APIC 0x6 (XEN) AMD-Vi: Error initialization In this case, if we should be able to look at the IVHD to correlate IOAPIC ID (0 or 1) from the "handle" field and map it back to the BDF to setup the remapping table. CASE2: BIOS disable the IOAPIC in the I/O bridge (RD890s chipset) This happens in the case when we were testing the per-device interrupt remapping table patch. (I think this is the issue you might be seeing in one of the Xen test system.) In this case, the MADT reports 1 IOAPIC while the IVRS contains two IVHD entries with both entries have the "hahandle" set to "0". Unfortunately, in this case, there is no obvious workaround, and the current solution is to disable IOMMU. I am working with some of the BIOS engineers and vendors to try to issue root-cause and provide BIOS update. Jan, Sander: Could you please provide the system information: 1. Motherboard vendor 2. BIOS version Thank you, Suravee> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
Sander Eikelenboom
2013-Feb-08 20:34 UTC
Re: Xen-unstable boot panic due to changeset 26517 AMD, IOMMU: Clean up old entries in remapping tables when creating new one
Friday, February 8, 2013, 9:14:35 PM, you wrote:> On 2/6/2013 6:52 AM, Jan Beulich wrote: >>>>> On 06.02.13 at 12:24, Sander Eikelenboom <linux@eikelenboom.it> wrote: >>> Hmm with the patch it does boot, but disables the I/O virtualization. >> Good. While, as said before, I still don''t understand why it didn''t >> crash earlier without that patch, I''m glad it''s fixed. Will post the >> patch for inclusion momentarily. >> >>> Output of xl-dmesg attached, do you still need a xen-sums of the situation >>> without the debug patch (where it does crash) ? >> And you can''t expect much else with broken ACPI tables: >> >> (XEN) AMD-Vi: IVHD Device Entry: type 0x48 id 0 flags 0xd7 >> (XEN) AMD-Vi: IVHD Special: 0000:00:14.0 variety 0x2 handle 0 >> >> This is a HPET entry. >> >> (XEN) AMD-Vi: IVHD Device Entry: type 0x48 id 0 flags 0 >> (XEN) AMD-Vi: IVHD Special: 0000:00:00.1 variety 0x1 handle 0x7 >> >> And this is an entry for IO-APIC #2 (ID 7), whereas FADT says >> >> (XEN) ACPI: IOAPIC (id[0x06] address[0xfec00000] gsi_base[0]) >> (XEN) IOAPIC[0]: apic_id 6, version 33, address 0xfec00000, GSI 0-23 >> (XEN) ACPI: IOAPIC (id[0x07] address[0xfec20000] gsi_base[24]) >> (XEN) IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24-55 >> >> so the IOMMU table is lacking an entry for the first IO-APIC, and >> without that we can''t set up per-device interrupt remapping (in >> which case we choose to disable the IOMMU altogether, albeit it >> had been questioned whether that isn''t making a bad situation >> worse in some cases). >> >> If you want the IOMMU back (at the price of re-opening the >> security issue described in XSA-36), you''d have to pass >> "iommu=amd-iommu-perdev-intremap" to the hypervisor. >> >> Jan> Jan,> It seems that all the recent issues with the AMD IOMMU regarding IOAPIC are > mainly caused by mismatch information from IVRS and MADT. Xen sets up "nr_ioapics" > by checking the number of IOAPICs reported in MADT, while the amd/iommu_acpi.c > code uses information from the IVHD entries of the IVRS to initialize IOMMU.> Most of the issues we are seeing are often triggered when platform BIOS decides > to disable one of the two IOAPICs in the RD890s configuration. I am trying to > summarize the cases here:> CASE1: BIOS disable the IOAPIC in the southbridge (SB8X0 chipset) > This is the case we are seeing here with the AMD 890-FX motherboard. > Here, the MADT is reporting 2 IOAPICs as shown by the message:> (XEN) ACPI: IOAPIC (id[0x06] address[0xfec00000] gsi_base[0]) > (XEN) IOAPIC[0]: apic_id 6, version 33, address 0xfec00000, GSI 0-23 > (XEN) ACPI: IOAPIC (id[0x07] address[0xfec20000] gsi_base[24]) > (XEN) IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24-55> However, when Xen tries to setup the IOMMU interrupt remapping table using IVHD > entries, there is only one IOAPIC (IOAPIC 1 with apic_id 7).> (XEN) AMD-Vi: IVHD Device Entry: type 0x48 id 0 flags 0 > (XEN) AMD-Vi: IVHD Special: 0000:00:00.1 variety 0x1 handle 0x7 > (XEN) IVHD Error: no information for IO-APIC 0x6 > (XEN) AMD-Vi: Error initialization> In this case, if we should be able to look at the IVHD to correlate IOAPIC ID (0 or 1) > from the "handle" field and map it back to the BDF to setup the remapping table.> CASE2: BIOS disable the IOAPIC in the I/O bridge (RD890s chipset) > This happens in the case when we were testing the per-device interrupt remapping > table patch. (I think this is the issue you might be seeing in one of the Xen test system.) > In this case, the MADT reports 1 IOAPIC while the IVRS contains two IVHD entries with both > entries have the "hahandle" set to "0". Unfortunately, in this case, there is no obvious > workaround, and the current solution is to disable IOMMU.> I am working with some of the BIOS engineers and vendors to try to issue root-cause > and provide BIOS update.> Jan, Sander: > Could you please provide the system information: > 1. Motherboard vendor > 2. BIOS versionSuravee, 1. My motherboard is a "890FXA-GD70" from MSI (http://www.msi.com/product/mb/890FXA-GD70.html) 2. As for the bios version: - I''m currently running 1.8 beta1, it''s a beta bios. It boots and works, but has the problem you described above. - I have tried all newer bioses up to the latest bios (1.15), but with these bioses the system halts during boot when the iommu option in the bios is enabled. With xen it halts right after the "(XEN) IVHD Error: no information for IO-APIC 0x6", so it could be another problem initializing the iommu i''m afraid. It also halts while trying to boot a baremetal linux kernel. When the iommu is disabled it boots fine. I hope a more direct approach of the bios engineers has some result, customer support most of the time have no clue and react like a firewall, bouncing and dropping all the packets :-( Thanks for trying and picking it up ! -- Sander> Thank you,> Suravee>> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel >>
Jan Beulich
2013-Feb-12 08:50 UTC
Re: Xen-unstable boot panic due to changeset 26517 AMD, IOMMU: Clean up old entries in remapping tables when creating new one
>>> On 08.02.13 at 21:14, Suravee Suthikulanit <suravee.suthikulpanit@amd.com> wrote: > CASE1: BIOS disable the IOAPIC in the southbridge (SB8X0 chipset) > This is the case we are seeing here with the AMD 890-FX motherboard. > Here, the MADT is reporting 2 IOAPICs as shown by the message: > > (XEN) ACPI: IOAPIC (id[0x06] address[0xfec00000] gsi_base[0]) > (XEN) IOAPIC[0]: apic_id 6, version 33, address 0xfec00000, GSI 0-23 > (XEN) ACPI: IOAPIC (id[0x07] address[0xfec20000] gsi_base[24]) > (XEN) IOAPIC[1]: apic_id 7, version 33, address 0xfec20000, GSI 24-55 > > However, when Xen tries to setup the IOMMU interrupt remapping table using > IVHD > entries, there is only one IOAPIC (IOAPIC 1 with apic_id 7). > > (XEN) AMD-Vi: IVHD Device Entry: type 0x48 id 0 flags 0 > (XEN) AMD-Vi: IVHD Special: 0000:00:00.1 variety 0x1 handle 0x7 > (XEN) IVHD Error: no information for IO-APIC 0x6 > (XEN) AMD-Vi: Error initializationBut you realize that it''s the _first_ IO-APIC that has no representation in IVRS? And it can only reasonably be the 2nd that the BIOS might choose to disable (or else legacy interrupts, including the timer, wouldn''t work).> In this case, if we should be able to look at the IVHD to correlate IOAPIC > ID (0 or 1) > from the "handle" field and map it back to the BDF to setup the remapping > table.I don''t currently see how you would figure out the BDF for the IO-APIC. Care to explain? Jan