Imre Szőllősi
2021-May-13 19:13 UTC
[Pkg-xen-devel] Bug#988477: xen-hypervisor-4.14-amd64: xen dmesg shows (XEN) AMD-Vi: IO_PAGE_FAULT on sata pci device
Package: src:xen Version: 4.14.1+11-gb0b734a8b3-1 Severity: critical Justification: causes serious data loss X-Debbugs-Cc: debianbts at virtualzone.hu Dear Maintainer, after a clean install of bullseye/testing the xen dmesg shows the following message: (XEN) AMD-Vi: IO_PAGE_FAULT: 0000:01:00.1 d0 addr fffffffdf8000000 flags 0x8 I this is the sata device: 01:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller (rev 01) or on another mb 01:00.1 SATA controller: Advanced Micro Devices, Inc. [AMD] Device 43eb in the case of write operations - ie. dbench or windows guest - there are a lot of messages sometimes the filesystem goes to read-only state, and the windows guest goes bsod tested on 3 hw: 1. asus prime b450m-a, ryzen 5 2600x, md raid1, 2x samsung 1TB 860evo, lvm: problem does appear 2. asus prime b550m-k, ryzen 5 5600x, md raid1, 2x samsung 1TB 870evo, lvm: problem does appear 3. asus prime b550m-k, ryzen 5 5600x, 1x samsung 1TB 850evo, lvm: problem does not appear 3. asus prime b550m-k, ryzen 5 5600x, 1x samsung 128GB 840pro, lvm: problem does not appear 3. asus prime b550m-k, ryzen 5 5600x, samsung 1TB 850evo + samsung 128GB 840pro, lvm, dbench on 2 ssds in parallel: problem does appear as i see, the problem does appear, when writes data parallel to 2 ssds Thanks! -- System Information: Debian Release: bullseye/sid APT prefers testing-security APT policy: (500, 'testing-security'), (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 5.10.0-6-amd64 (SMP w/12 CPU threads) Locale: LANG=hu_HU.UTF-8, LC_CTYPE=hu_HU.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled xen-hypervisor-4.14-amd64 depends on no packages. Versions of packages xen-hypervisor-4.14-amd64 recommends: ii xen-hypervisor-common 4.14.1+11-gb0b734a8b3-1 ii xen-utils-4.14 4.14.1+11-gb0b734a8b3-1 xen-hypervisor-4.14-amd64 suggests no packages. -- no debconf information
Imre Szőllősi
2021-Jun-13 13:58 UTC
[Pkg-xen-devel] Bug#988477: Acknowledgement (xen-hypervisor-4.14-amd64: xen dmesg shows (XEN) AMD-Vi: IO_PAGE_FAULT on sata pci device)
i tested on 4th hw 4. asus m4n78 pro, phenom ii x4 905e, md raid1, 2x samsung 1TB 860evo, lvm: problem does not appear as i see, not all mb/chipset/sata pcie device affected Thanks!
Hans van Kranenburg
2021-Aug-05 20:46 UTC
[Pkg-xen-devel] Bug#988477: Bug#988477: Acknowledgement (xen-hypervisor-4.14-amd64: xen dmesg shows (XEN) AMD-Vi: IO_PAGE_FAULT on sata pci device)
severity 988477 normal tags 988477 + moreinfo + upstream - bullseye-ignore thanks Hi! On 6/13/21 3:58 PM, Imre Sz?ll?si wrote:> i tested on 4th hw > > 4. asus m4n78 pro, phenom ii x4 905e, md raid1, 2x samsung 1TB 860evo, > lvm: problem does not appear > > as i see, not all mb/chipset/sata pcie device affectedThanks for your report, and for trying out different combinations of hardware. While doing a short internet search about the problems you're seeing while using AMD ryzen, sata, nvme and iommu, I suspect this problem does not have a lot to do with Xen specifically, but more with the hardware and its firmware. This also means that it's not a Debian packaging problem, and it cannot be fixed by me (or the Debian Xen team). If you want to research this problem more, I can maybe be of some help by providing suggestions. Still, you will have to do all of the actual work, since I do not have your hardware here. The first thing I would suggest is to try reproduce the problem when booting with just Linux without Xen, and then trying the dbench test. If you don't actually need to directly pass-through hardware to a Xen guest, you can also try disabling iommu, or researching other iommuoptions that can serve as a workaround. In any case, further reports will need to have more detailed information. For example, instead of "there are a lot of messages", provide a text attachment with a piece of logging that shows these messages. I'm tagging this bug 'moreinfo' now, since it will depend on your availability and abilities to work on it to have it advance. Have fun, Hans van Kranenburg
Imre Szőllősi
2021-Aug-08 13:34 UTC
[Pkg-xen-devel] Bug#988477: Bug#988477: Acknowledgement (xen-hypervisor-4.14-amd64: xen dmesg shows (XEN) AMD-Vi: IO_PAGE_FAULT on sata pci device)
An HTML attachment was scrubbed... URL: <http://alioth-lists.debian.net/pipermail/pkg-xen-devel/attachments/20210808/f65cb55f/attachment.htm>
tags 988477 - moreinfo found 988477 4.17.2+76-ge1f9cb16e2-1~deb12u1 affects 988477 src:linux severity 988477 critical quit I am also observing #988477 occur. This machine has a AMD Zen 4 processor. The first observation was when motherboard/processor was swapped out, the older motherboard/processor was several generations old. The pattern which is emerging is Linux MD RAID1 plus recent AMD processor which has full IOMMU functionality. The older machine was believed to have an IOMMU, but the BIOS wasn't creating appropriate ACPI tables (IVRS) and thus Xen was unable to utilize it. This seems to be occuring with a small percentage of write operations. Subsequent read operations appear to be fine. I am not convinced this is a Xen bug. I suspect this is instead a bug in the Linux MD subsystem. In particular if the DMA interface was designed assuming only a single device would ever access any page, but the MD RAID1 driver is reusing the same page for both devices. IOMMU page release could be handled by marking the page unused in a device data structure and later removed by sweeping a table. In such case if the MD-RAID1 driver was to redirect the page to another device between these two steps, the entry for a subsequent device could be wiped out when trying to invalidate an entry for a prior device. Anyway, I'm also observing bug #988477. This could also be a kernel bug. So far no crashes/confirmed data loss have occured, but sweeping the mirror does turn up small numbers of inconsistencies. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg at m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Reasonably Related Threads
- Bug#988477: xen-hypervisor-4.14-amd64: xen dmesg shows (XEN) AMD-Vi: IO_PAGE_FAULT on sata pci device
- Bug#1032480: xen: Important cherry-picks for bookworm/updates
- Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing
- Bug#810964: [Xen-devel] [BUG] EDAC infomation partially missing
- xen 4.17.2+76-ge1f9cb16e2-1 MIGRATED to testing