Matthias
2013-Feb-09 02:12 UTC
Latest AMD, IOMMU Security Change causing CPU0 Panic and general Problems with AMD+IOMMU changes
Hi, unfortunatly your latest change "AMD,IOMMU: Clean up old entries in remapping tables when creating new one" (Changeset 26517 in xen-unstable and 25975 in xen-4.2-testing) is causing a CPU0 Panic at boot for me. When i tried to boot latest versions of xen unstable or xen-testing, my dom0 gives me the message: (XEN) ************************************* (XEN) Panic on CPU 0: (XEN) DATAL PAGE FAULT (XEN) [error_code=0000] (XEN) Faulting linear address: 0000000000000000 (XEN) ************************************* I can reproduce this with vanilla kernels 3.8-rc6, 3.7.6, 3.7.4 and Debian Kernel 3.2.0-2. My System: CPU: AMD Phenom II 1090T Mainboard: Asus Crosshair IV Ram: 16GB VGA: AMD HD5700 + AMD HD5450 OS: latest Debian Wheezy Compiler: gcc 4.7.2 Grub lines: multiboot /boot/xen.gz dom0_mem=2048M loglvl=all guest_loglvl=all iommu=1 module /boot/vmlinuz-3.7.6-xen root=/dev/sda1 ro xen-pciback.hide=(06:00.0)(06:00.1)(00:12.0)(00:12.2)(00:14.5)(00:16.0)(00:16.2) xen-pciback.permissive After the error I did a bisection of the current xen-unstable and found 26516 being the last working one for me. So 26517 (the above changeset) is the faulting one. When reverting the specific changeset, my server boots beyond the point of the previous pagefault, but runs in some serious trouble later in the boot process when trying to establish the data link. I''m seeing multiple errors like: ata1.00: qc timeout (cmd 0xec) ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) ata9.00: TEST_UNIT_READY failed (err_mask=0x5) and generally a lot of disk time outs. The server, not able to mount any root devices, later reverts to the initramfs. So basically besides the above mentioned CPU Panic, i got other issues with your changes which prevent a clear boot. Are there any prerequisites for your changes I am missing or can I help in any way with fixing this?
Ian Campbell
2013-Feb-11 11:22 UTC
Re: Latest AMD, IOMMU Security Change causing CPU0 Panic and general Problems with AMD+IOMMU changes
On Sat, 2013-02-09 at 02:12 +0000, Matthias wrote:> Hi, > > unfortunatly your latest change "AMD,IOMMU: Clean up old entries in > remapping tables when creating new one" (Changeset 26517 in > xen-unstable and 25975 in xen-4.2-testing) is causing a CPU0 Panic at > boot for me. > > When i tried to boot latest versions of xen unstable or xen-testing, > my dom0 gives me the message: > > (XEN) ************************************* > (XEN) Panic on CPU 0: > (XEN) DATAL PAGE FAULT > (XEN) [error_code=0000] > (XEN) Faulting linear address: 0000000000000000 > (XEN) *************************************I think this is fixed by "[PATCH 1/2] AMD IOMMU: also spot missing IO-APIC entries in IVRS table" posted to xen-devel on Wednesday. Message-ID <511264B402000078000BC754@nat28.tlf.novell.com> Ian.
Jan Beulich
2013-Feb-12 08:42 UTC
Re: Latest AMD, IOMMU Security Change causing CPU0 Panic and general Problems with AMD+IOMMU changes
>>> On 11.02.13 at 12:22, Ian Campbell <Ian.Campbell@citrix.com> wrote: > On Sat, 2013-02-09 at 02:12 +0000, Matthias wrote: >> Hi, >> >> unfortunatly your latest change "AMD,IOMMU: Clean up old entries in >> remapping tables when creating new one" (Changeset 26517 in >> xen-unstable and 25975 in xen-4.2-testing) is causing a CPU0 Panic at >> boot for me. >> >> When i tried to boot latest versions of xen unstable or xen-testing, >> my dom0 gives me the message: >> >> (XEN) ************************************* >> (XEN) Panic on CPU 0: >> (XEN) DATAL PAGE FAULT >> (XEN) [error_code=0000] >> (XEN) Faulting linear address: 0000000000000000 >> (XEN) ************************************* > > I think this is fixed by "[PATCH 1/2] AMD IOMMU: also spot missing > IO-APIC entries in IVRS table" posted to xen-devel on Wednesday. > Message-ID <511264B402000078000BC754@nat28.tlf.novell.com>Without a full register/stack trace I would also guess so, but in that mode it''s no more than a guess really. Jan
Matthias
2013-Feb-13 08:27 UTC
Re: Latest AMD, IOMMU Security Change causing CPU0 Panic and general Problems with AMD+IOMMU changes
Hi, thanks for the info, I guess I have to watch xen-devel more closely. I will test this tomorow when i get a chance and report if the patch fixes my issue. Do you have any idea when all the new AMD/IOMMU patches will be included in the unstable repo? Since i got some issues with the other patches as well and have seen you got around 10 patches for fixing this in xen-devel, it would generally like the idea of having all the fixes in the hg repo then patching everything by hand. Thanks 2013/2/12 Jan Beulich <JBeulich@suse.com>:>>>> On 11.02.13 at 12:22, Ian Campbell <Ian.Campbell@citrix.com> wrote: >> On Sat, 2013-02-09 at 02:12 +0000, Matthias wrote: >>> Hi, >>> >>> unfortunatly your latest change "AMD,IOMMU: Clean up old entries in >>> remapping tables when creating new one" (Changeset 26517 in >>> xen-unstable and 25975 in xen-4.2-testing) is causing a CPU0 Panic at >>> boot for me. >>> >>> When i tried to boot latest versions of xen unstable or xen-testing, >>> my dom0 gives me the message: >>> >>> (XEN) ************************************* >>> (XEN) Panic on CPU 0: >>> (XEN) DATAL PAGE FAULT >>> (XEN) [error_code=0000] >>> (XEN) Faulting linear address: 0000000000000000 >>> (XEN) ************************************* >> >> I think this is fixed by "[PATCH 1/2] AMD IOMMU: also spot missing >> IO-APIC entries in IVRS table" posted to xen-devel on Wednesday. >> Message-ID <511264B402000078000BC754@nat28.tlf.novell.com> > > Without a full register/stack trace I would also guess so, but in > that mode it''s no more than a guess really. > > Jan >
Jan Beulich
2013-Feb-13 08:31 UTC
Re: Latest AMD, IOMMU Security Change causing CPU0 Panic and general Problems with AMD+IOMMU changes
>>> On 13.02.13 at 09:27, Matthias <matthias.kannenberg@googlemail.com> wrote: > Do you have any idea when all the new AMD/IOMMU patches will be > included in the unstable repo? Since i got some issues with the other > patches as well and have seen you got around 10 patches for fixing > this in xen-devel, it would generally like the idea of having all the > fixes in the hg repo then patching everything by hand.This largely depends on the pending patches getting reviewed and acked appropriately. But it''s three patches only according to my counting, one of which being in need of a conceptual decision (i.e. unlikely to go in very quickly), and out of those three only one is a regression fix (the one I pointed you to). Jan
Matthias
2013-Feb-15 02:55 UTC
Re: Latest AMD, IOMMU Security Change causing CPU0 Panic and general Problems with AMD+IOMMU changes
Hi, I finally had the chance to test the new patches and it''s great that they made it to the main rep already. With the new patches (rev 26532 xen-unstable) I neither have the boot time crash nore the disk access issue I reported above. Unfortunatly, the changes disable my IOMMU rendering pci and vga passthrough unusable for me. Now, I might have missed something, but what exactly is the point of this at all? My Xen is running fine with AMD IOMMU for years now but if i still want to do this, I have to revert changes 26532, 26531, 26519, 26518 and 25617 (basically all the AMD/IOMMU changes). When don''t reverting them, I get the AMD-Vi is disabled messages at boot time and an error message for every PCI device I try to assign. Is there any chance you might rethink the whole approach? Because pci passthrough is something i heavily use.. xl dmesg output for AMD-Vi: (XEN) Initing memory sharing. (XEN) AMD Fam10h machine check reporting enabled (XEN) PCI: MCFG configuration 0: base e0000000 segment 0000 buses 00 - ff (XEN) PCI: Not using MCFG for segment 0000 bus 00-ff (XEN) IVHD Error: no information for IO-APIC 0x6 (XEN) AMD-Vi: Error initialization (XEN) I/O virtualisation disabled (XEN) ENABLING IO-APIC IRQs (XEN) -> Using new ACK method (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1 Output when starting the domU with PCI passthrough (every pci-failed line is for one pci device i try to pass): root@Server:~/xen/domu# xl create WORK Parsing config from WORK xc: info: VIRTUAL MEMORY ARRANGEMENT: Loader: 0000000000100000->000000000019e9a8 Modules: 0000000000000000->0000000000000000 TOTAL: 0000000000000000->000000013f800000 ENTRY ADDRESS: 0000000000100000 xc: info: PHYSICAL MEMORY ALLOCATION: 4KB PAGES: 0x0000000000000200 2MB PAGES: 0x00000000000003fb 1GB PAGES: 0x0000000000000003 libxl: error: libxl_pci.c:948:do_pci_add: xc_assign_device failed libxl: error: libxl_pci.c:948:do_pci_add: xc_assign_device failed libxl: error: libxl_pci.c:989:libxl__device_pci_reset: The kernel doesn''t support reset from sysfs for PCI device 0000:00:12.0 libxl: error: libxl_pci.c:948:do_pci_add: xc_assign_device failed libxl: error: libxl_pci.c:948:do_pci_add: xc_assign_device failed libxl: error: libxl_pci.c:989:libxl__device_pci_reset: The kernel doesn''t support reset from sysfs for PCI device 0000:00:13.0 libxl: error: libxl_pci.c:948:do_pci_add: xc_assign_device failed libxl: error: libxl_pci.c:948:do_pci_add: xc_assign_device failed libxl: error: libxl_pci.c:948:do_pci_add: xc_assign_device failed libxl: error: libxl_pci.c:989:libxl__device_pci_reset: The kernel doesn''t support reset from sysfs for PCI device 0000:00:14.5 libxl: error: libxl_pci.c:948:do_pci_add: xc_assign_device failed libxl: error: libxl_pci.c:989:libxl__device_pci_reset: The kernel doesn''t support reset from sysfs for PCI device 0000:00:16.0 libxl: error: libxl_pci.c:948:do_pci_add: xc_assign_device failed libxl: error: libxl_pci.c:948:do_pci_add: xc_assign_device failed Daemon running with PID 3448 The ''kernel doesn''t support reset from sysfs'' messages are normal and don''t indicate any malfunction. 2013/2/13 Jan Beulich <JBeulich@suse.com>:>>>> On 13.02.13 at 09:27, Matthias <matthias.kannenberg@googlemail.com> wrote: >> Do you have any idea when all the new AMD/IOMMU patches will be >> included in the unstable repo? Since i got some issues with the other >> patches as well and have seen you got around 10 patches for fixing >> this in xen-devel, it would generally like the idea of having all the >> fixes in the hg repo then patching everything by hand. > > This largely depends on the pending patches getting reviewed > and acked appropriately. But it''s three patches only according to > my counting, one of which being in need of a conceptual decision > (i.e. unlikely to go in very quickly), and out of those three only > one is a regression fix (the one I pointed you to). > > Jan >
Jan Beulich
2013-Feb-15 08:08 UTC
Re: Latest AMD, IOMMU Security Change causing CPU0 Panic and general Problems with AMD+IOMMU changes
>>> On 15.02.13 at 03:55, Matthias <matthias.kannenberg@googlemail.com> wrote: > Unfortunatly, the changes disable my IOMMU rendering pci and vga > passthrough unusable for me. Now, I might have missed something, but > what exactly is the point of this at all? My Xen is running fine with > AMD IOMMU for years now but if i still want to do this, I have to > revert changes 26532, 26531, 26519, 26518 and 25617 (basically all the > AMD/IOMMU changes).No, you rather need to get your firmware fixed, because this> (XEN) IVHD Error: no information for IO-APIC 0x6is simply not tolerable. But you can, at the expense of security, revert to using the global interrupt remapping table, as pointed out in this same context to others before ("iommu=no-amd-iommu-perdev-intremap"). And yes, we are indeed re-thinking the situation, but in everything to consider doing you need to realize the security implications. See for instance http://lists.xen.org/archives/html/xen-devel/2013-02/msg00591.html and the single IOMMU consideration in http://lists.xen.org/archives/html/xen-devel/2013-02/msg00817.html (but which I don''t think will actually work without looking at the PCI bus topology). Feel free to participate in that discussion. Jan
Matthias
2013-Feb-15 13:53 UTC
Re: Latest AMD, IOMMU Security Change causing CPU0 Panic and general Problems with AMD+IOMMU changes
Hi, you were right. After your hint about my firmware, I did a BIOS update which ultimately fixed my issue and now IOMMU is activated with your patches, too. Sorry that I haven''t thought about this earlier. So in general I think this is solved for me. My xen now behaves rather strange (random HDD / DMA read / write errors), but i can reproduce this with and without your patches so I think this is a whole other story. Thank you for your help 2013/2/15 Jan Beulich <JBeulich@suse.com>:>>>> On 15.02.13 at 03:55, Matthias <matthias.kannenberg@googlemail.com> wrote: >> Unfortunatly, the changes disable my IOMMU rendering pci and vga >> passthrough unusable for me. Now, I might have missed something, but >> what exactly is the point of this at all? My Xen is running fine with >> AMD IOMMU for years now but if i still want to do this, I have to >> revert changes 26532, 26531, 26519, 26518 and 25617 (basically all the >> AMD/IOMMU changes). > > No, you rather need to get your firmware fixed, because this > >> (XEN) IVHD Error: no information for IO-APIC 0x6 > > is simply not tolerable. But you can, at the expense of security, > revert to using the global interrupt remapping table, as pointed > out in this same context to others before > ("iommu=no-amd-iommu-perdev-intremap"). > > And yes, we are indeed re-thinking the situation, but in everything > to consider doing you need to realize the security implications. See > for instance > http://lists.xen.org/archives/html/xen-devel/2013-02/msg00591.html > and the single IOMMU consideration in > http://lists.xen.org/archives/html/xen-devel/2013-02/msg00817.html > (but which I don''t think will actually work without looking at the PCI > bus topology). Feel free to participate in that discussion. > > Jan >