Hi Jan, Attached patch dumps io page fault flags. The flags show the reason of the fault and tell us if this is an unmapped interrupt fault or a DMA fault. Thanks, Wei signed-off-by: Wei Wang <wei.wang2@amd.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Sander Eikelenboom
2012-Sep-05 22:59 UTC
Re: [PATCH] amd iommu: Dump flags of IO page faults
Wednesday, September 5, 2012, 4:42:42 PM, you wrote:> Hi Jan, > Attached patch dumps io page fault flags. The flags show the reason of > the fault and tell us if this is an unmapped interrupt fault or a DMA fault.> Thanks, > Wei> signed-off-by: Wei Wang <wei.wang2@amd.com>I have applied the patch and the flags seem to differ between the faults: AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020 Complete xl dmesg and lspci -vvvknn attached. Thx -- Sander _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 09/06/2012 12:59 AM, Sander Eikelenboom wrote:> > Wednesday, September 5, 2012, 4:42:42 PM, you wrote: > >> Hi Jan, >> Attached patch dumps io page fault flags. The flags show the reason of >> the fault and tell us if this is an unmapped interrupt fault or a DMA fault. > >> Thanks, >> Wei > >> signed-off-by: Wei Wang<wei.wang2@amd.com> > > > I have applied the patch and the flags seem to differ between the faults: > > AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 > (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 > (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 > (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020OK, so they are not interrupt requests. I guess further information from your system would be helpful to debug this issue: 1) xl info 2) xl list 3) lscpi -vvv (NOTE: not in dom0 but in your guest) 4) cat /proc/iomem (in both dom0 and your hvm guest) * I would also like to know the symptoms of device 0x0700 when IO_PF happened. Did it stop working? (BTW: I copied a few options from your boot cmd line and it worked with my RD890 system dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug apic=debug iommu=on,verbose,debug,no-sharept * so, what OEM board you have?) Also from your log, these lines looks very strange: (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to read-only memory page. gfn=0xd5, mfn=0xa4a11 (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to read-only memory page. gfn=0xd7, mfn=0xa4a0f (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to read-only memory page. gfn=0xd9, mfn=0xa4a0d (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to read-only memory page. gfn=0xdb, mfn=0xa4a0b (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to read-only memory page. gfn=0xdd, mfn=0xa4a09 (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to read-only memory page. gfn=0xdf, mfn=0xa4a07 (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to read-only memory page. gfn=0xe1, mfn=0xa4a05 (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to read-only memory page. gfn=0xe3, mfn=0xa4a03 (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to read-only memory page. gfn=0xe5, mfn=0xa4a01 (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to read-only memory page. gfn=0xe7, mfn=0xa463f (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to read-only memory page. gfn=0xe9, mfn=0xa463d (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to read-only memory page. gfn=0xeb, mfn=0xa463b (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to read-only memory page. gfn=0xed, mfn=0xa4639 (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to read-only memory page. gfn=0xef, mfn=0xa4637 (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0 (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa90f8300 (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa90f8340 (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa90f8380 (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa90f83c0 * they are just followed by the IO PAGE fault. Do you know where are they from? Your video card driver maybe? Thanks, Wei> Complete xl dmesg and lspci -vvvknn attached. > > Thx > > -- > Sander
Sander Eikelenboom
2012-Sep-06 13:50 UTC
Re: [PATCH] amd iommu: Dump flags of IO page faults
Thursday, September 6, 2012, 3:32:51 PM, you wrote:> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote: >> >> Wednesday, September 5, 2012, 4:42:42 PM, you wrote: >> >>> Hi Jan, >>> Attached patch dumps io page fault flags. The flags show the reason of >>> the fault and tell us if this is an unmapped interrupt fault or a DMA fault. >> >>> Thanks, >>> Wei >> >>> signed-off-by: Wei Wang<wei.wang2@amd.com> >> >> >> I have applied the patch and the flags seem to differ between the faults: >> >> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 >> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020> OK, so they are not interrupt requests. I guess further information from > your system would be helpful to debug this issue: > 1) xl info > 2) xl list > 3) lscpi -vvv (NOTE: not in dom0 but in your guest) > 4) cat /proc/iomem (in both dom0 and your hvm guest)dom14 is not a HVM guest,it''s a PV guest. I will try to make a complete package, and try with one pv domain only where the devices are being passed through just to simplify the setup.> * I would also like to know the symptoms of device 0x0700 when IO_PF > happened. Did it stop working?Yes it stops working, the video capture just freezes, but the driver doesn''t bail out. For the USB controller (0x0a06) it starts to give errors for usbdev_open in the guest.> (BTW: I copied a few options from your boot cmd line and it worked with > my RD890 system> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps > cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug > apic=debug iommu=on,verbose,debug,no-sharept> * so, what OEM board you have?)MSI 890FXA-GD70> Also from your log, these lines looks very strange:> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to > read-only memory page. gfn=0xd5, mfn=0xa4a11 > (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to > read-only memory page. gfn=0xd7, mfn=0xa4a0f > (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to > read-only memory page. gfn=0xd9, mfn=0xa4a0d > (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to > read-only memory page. gfn=0xdb, mfn=0xa4a0b > (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to > read-only memory page. gfn=0xdd, mfn=0xa4a09 > (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to > read-only memory page. gfn=0xdf, mfn=0xa4a07 > (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to > read-only memory page. gfn=0xe1, mfn=0xa4a05 > (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to > read-only memory page. gfn=0xe3, mfn=0xa4a03 > (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to > read-only memory page. gfn=0xe5, mfn=0xa4a01 > (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to > read-only memory page. gfn=0xe7, mfn=0xa463f > (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to > read-only memory page. gfn=0xe9, mfn=0xa463d > (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to > read-only memory page. gfn=0xeb, mfn=0xa463b > (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to > read-only memory page. gfn=0xed, mfn=0xa4639 > (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to > read-only memory page. gfn=0xef, mfn=0xa4637 > (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id > = 0x0a06, fault address = 0xc2c2c2c0 > (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device > id = 0x0700, fault address = 0xa90f8300 > (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device > id = 0x0700, fault address = 0xa90f8340 > (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device > id = 0x0700, fault address = 0xa90f8380 > (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device > id = 0x0700, fault address = 0xa90f83c0> * they are just followed by the IO PAGE fault. Do you know where are > they from? Your video card driver maybe?From a HVM domain with a old (3.0.3) kernel, but the faults also occur without this domain being started.> Thanks, > Wei>> Complete xl dmesg and lspci -vvvknn attached. >> >> Thx >> >> -- >> Sander
On 09/06/2012 03:50 PM, Sander Eikelenboom wrote:> > Thursday, September 6, 2012, 3:32:51 PM, you wrote: > >> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote: >>> >>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote: >>> >>>> Hi Jan, >>>> Attached patch dumps io page fault flags. The flags show the reason of >>>> the fault and tell us if this is an unmapped interrupt fault or a DMA fault. >>> >>>> Thanks, >>>> Wei >>> >>>> signed-off-by: Wei Wang<wei.wang2@amd.com> >>> >>> >>> I have applied the patch and the flags seem to differ between the faults: >>> >>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 >>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020 > >> OK, so they are not interrupt requests. I guess further information from >> your system would be helpful to debug this issue: >> 1) xl info >> 2) xl list >> 3) lscpi -vvv (NOTE: not in dom0 but in your guest) >> 4) cat /proc/iomem (in both dom0 and your hvm guest) > > dom14 is not a HVM guest,it''s a PV guest.Ah, I see. PV guest is quite different than hvm, it does use p2m tables as io page tables. So no-sharept option does not work in this case. PV guests always use separated io page tables. There might be some incorrect mappings on the page tables. I will check this on my side. Thanks, Wei> I will try to make a complete package, and try with one pv domain only where the devices are being passed through just to simplify the setup. > > >> * I would also like to know the symptoms of device 0x0700 when IO_PF >> happened. Did it stop working? > > Yes it stops working, the video capture just freezes, but the driver doesn''t bail out. > For the USB controller (0x0a06) it starts to give errors for usbdev_open in the guest. > >> (BTW: I copied a few options from your boot cmd line and it worked with >> my RD890 system > >> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps >> cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug >> apic=debug iommu=on,verbose,debug,no-sharept > >> * so, what OEM board you have?) > > MSI 890FXA-GD70 > >> Also from your log, these lines looks very strange: > >> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >> read-only memory page. gfn=0xd5, mfn=0xa4a11 >> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >> read-only memory page. gfn=0xd7, mfn=0xa4a0f >> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >> read-only memory page. gfn=0xd9, mfn=0xa4a0d >> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >> read-only memory page. gfn=0xdb, mfn=0xa4a0b >> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >> read-only memory page. gfn=0xdd, mfn=0xa4a09 >> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >> read-only memory page. gfn=0xdf, mfn=0xa4a07 >> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >> read-only memory page. gfn=0xe1, mfn=0xa4a05 >> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >> read-only memory page. gfn=0xe3, mfn=0xa4a03 >> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >> read-only memory page. gfn=0xe5, mfn=0xa4a01 >> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >> read-only memory page. gfn=0xe7, mfn=0xa463f >> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >> read-only memory page. gfn=0xe9, mfn=0xa463d >> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >> read-only memory page. gfn=0xeb, mfn=0xa463b >> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >> read-only memory page. gfn=0xed, mfn=0xa4639 >> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >> read-only memory page. gfn=0xef, mfn=0xa4637 >> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id >> = 0x0a06, fault address = 0xc2c2c2c0 >> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >> id = 0x0700, fault address = 0xa90f8300 >> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >> id = 0x0700, fault address = 0xa90f8340 >> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >> id = 0x0700, fault address = 0xa90f8380 >> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >> id = 0x0700, fault address = 0xa90f83c0 > >> * they are just followed by the IO PAGE fault. Do you know where are >> they from? Your video card driver maybe? > > From a HVM domain with a old (3.0.3) kernel, but the faults also occur without this domain being started. > > >> Thanks, >> Wei > > >>> Complete xl dmesg and lspci -vvvknn attached. >>> >>> Thx >>> >>> -- >>> Sander > > > > >
Sander Eikelenboom
2012-Sep-06 15:08 UTC
Re: [PATCH] amd iommu: Dump flags of IO page faults
Thursday, September 6, 2012, 5:03:05 PM, you wrote:> On 09/06/2012 03:50 PM, Sander Eikelenboom wrote: >> >> Thursday, September 6, 2012, 3:32:51 PM, you wrote: >> >>> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote: >>>> >>>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote: >>>> >>>>> Hi Jan, >>>>> Attached patch dumps io page fault flags. The flags show the reason of >>>>> the fault and tell us if this is an unmapped interrupt fault or a DMA fault. >>>> >>>>> Thanks, >>>>> Wei >>>> >>>>> signed-off-by: Wei Wang<wei.wang2@amd.com> >>>> >>>> >>>> I have applied the patch and the flags seem to differ between the faults: >>>> >>>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 >>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020 >> >>> OK, so they are not interrupt requests. I guess further information from >>> your system would be helpful to debug this issue: >>> 1) xl info >>> 2) xl list >>> 3) lscpi -vvv (NOTE: not in dom0 but in your guest) >>> 4) cat /proc/iomem (in both dom0 and your hvm guest) >> >> dom14 is not a HVM guest,it''s a PV guest.> Ah, I see. PV guest is quite different than hvm, it does use p2m tables > as io page tables. So no-sharept option does not work in this case. PV > guests always use separated io page tables. There might be some > incorrect mappings on the page tables. I will check this on my side.> Thanks, > WeiIn that case it''s perhaps mysteriously semi related to a p2m bug in kernels > 3.4 which freezes guests on my intel box. Though guests start fine on the amd box with kernels > 3.4, perhaps it does give issues for iommu if those are tied somehow.>> I will try to make a complete package, and try with one pv domain only where the devices are being passed through just to simplify the setup. >> >> >>> * I would also like to know the symptoms of device 0x0700 when IO_PF >>> happened. Did it stop working? >> >> Yes it stops working, the video capture just freezes, but the driver doesn''t bail out. >> For the USB controller (0x0a06) it starts to give errors for usbdev_open in the guest. >> >>> (BTW: I copied a few options from your boot cmd line and it worked with >>> my RD890 system >> >>> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps >>> cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug >>> apic=debug iommu=on,verbose,debug,no-sharept >> >>> * so, what OEM board you have?) >> >> MSI 890FXA-GD70 >> >>> Also from your log, these lines looks very strange: >> >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xd5, mfn=0xa4a11 >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xd7, mfn=0xa4a0f >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xd9, mfn=0xa4a0d >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xdb, mfn=0xa4a0b >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xdd, mfn=0xa4a09 >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xdf, mfn=0xa4a07 >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xe1, mfn=0xa4a05 >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xe3, mfn=0xa4a03 >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xe5, mfn=0xa4a01 >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xe7, mfn=0xa463f >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xe9, mfn=0xa463d >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xeb, mfn=0xa463b >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xed, mfn=0xa4639 >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xef, mfn=0xa4637 >>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id >>> = 0x0a06, fault address = 0xc2c2c2c0 >>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>> id = 0x0700, fault address = 0xa90f8300 >>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>> id = 0x0700, fault address = 0xa90f8340 >>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>> id = 0x0700, fault address = 0xa90f8380 >>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>> id = 0x0700, fault address = 0xa90f83c0 >> >>> * they are just followed by the IO PAGE fault. Do you know where are >>> they from? Your video card driver maybe? >> >> From a HVM domain with a old (3.0.3) kernel, but the faults also occur without this domain being started. >> >> >>> Thanks, >>> Wei >> >> >>>> Complete xl dmesg and lspci -vvvknn attached. >>>> >>>> Thx >>>> >>>> -- >>>> Sander >> >> >> >> >>
Sander Eikelenboom
2012-Sep-07 07:32 UTC
Re: [PATCH] amd iommu: Dump flags of IO page faults
Thursday, September 6, 2012, 5:03:05 PM, you wrote:> On 09/06/2012 03:50 PM, Sander Eikelenboom wrote: >> >> Thursday, September 6, 2012, 3:32:51 PM, you wrote: >> >>> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote: >>>> >>>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote: >>>> >>>>> Hi Jan, >>>>> Attached patch dumps io page fault flags. The flags show the reason of >>>>> the fault and tell us if this is an unmapped interrupt fault or a DMA fault. >>>> >>>>> Thanks, >>>>> Wei >>>> >>>>> signed-off-by: Wei Wang<wei.wang2@amd.com> >>>> >>>> >>>> I have applied the patch and the flags seem to differ between the faults: >>>> >>>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 >>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020 >> >>> OK, so they are not interrupt requests. I guess further information from >>> your system would be helpful to debug this issue: >>> 1) xl info >>> 2) xl list >>> 3) lscpi -vvv (NOTE: not in dom0 but in your guest) >>> 4) cat /proc/iomem (in both dom0 and your hvm guest) >> >> dom14 is not a HVM guest,it''s a PV guest.> Ah, I see. PV guest is quite different than hvm, it does use p2m tables > as io page tables. So no-sharept option does not work in this case. PV > guests always use separated io page tables. There might be some > incorrect mappings on the page tables. I will check this on my side.I have reverted the machine to xen-4.1.4-pre (changeset 23353) and kept everything else the same. I haven''t seen any IO PAGE FAULTS after that. I did spot some differences in the output from lspci between xen 4.1 and 4.2, related to MSI enabled or not for the IOMMU device. Have attached the xl/xm dmesg and lspci from booting with both versions. lspci: 00:00.2 Generic system peripheral [0806]: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] Subsystem: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 10 Capabilities: [40] Secure device <?> 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+ 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee0100c Data: 4128 Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+ Although it seems enabled, shouldn''t the IRQ number used be much higher than 10 for MSI interrupts ? There is another difference in the bridge device that''s in front of the 0a:00.6 device that faults before the kernel is even booted. 00:03.0 PCI bridge [0604]: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port C) [1002:5a17] (prog-if 00 [Normal decode]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ 4.1: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 4.2: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort+ <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0 I/O behind bridge: 0000f000-00000fff Memory behind bridge: f9f00000-f9ffffff Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff 4.1: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- 4.2: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort+ <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us ExtTag+ RBE+ FLReset- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <1us, L1 <8us ClockPM- Surprise- LLActRep+ BwNot+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #3, PowerLimit 10.000W; Interlock- NoCompl+ SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet+ LinkState+ serveerstertje:~# lspci -t -[0000:00]-+-00.0 +-00.2 +-02.0-[0b]----00.0 +-03.0-[0a]--+-00.0 | +-00.1 | +-00.2 | +-00.3 | +-00.4 | +-00.5 | +-00.6 | \-00.7 +-05.0-[09]----00.0 +-06.0-[08]----00.0 +-0a.0-[07]----00.0 +-0b.0-[06]--+-00.0 | \-00.1 +-0c.0-[05]----00.0 +-0d.0-[04]--+-00.0 | +-00.1 | +-00.2 | +-00.3 | +-00.4 | +-00.5 | +-00.6 | \-00.7 +-11.0 +-12.0 +-12.2 +-13.0 +-13.2 +-14.0 +-14.3 +-14.4-[03]----06.0 +-14.5 +-15.0-[02]-- +-16.0 +-16.2 +-18.0 +-18.1 +-18.2 +-18.3 \-18.4> Thanks, > Wei>> I will try to make a complete package, and try with one pv domain only where the devices are being passed through just to simplify the setup. >> >> >>> * I would also like to know the symptoms of device 0x0700 when IO_PF >>> happened. Did it stop working? >> >> Yes it stops working, the video capture just freezes, but the driver doesn''t bail out. >> For the USB controller (0x0a06) it starts to give errors for usbdev_open in the guest. >> >>> (BTW: I copied a few options from your boot cmd line and it worked with >>> my RD890 system >> >>> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps >>> cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug >>> apic=debug iommu=on,verbose,debug,no-sharept >> >>> * so, what OEM board you have?) >> >> MSI 890FXA-GD70 >> >>> Also from your log, these lines looks very strange: >> >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xd5, mfn=0xa4a11 >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xd7, mfn=0xa4a0f >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xd9, mfn=0xa4a0d >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xdb, mfn=0xa4a0b >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xdd, mfn=0xa4a09 >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xdf, mfn=0xa4a07 >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xe1, mfn=0xa4a05 >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xe3, mfn=0xa4a03 >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xe5, mfn=0xa4a01 >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xe7, mfn=0xa463f >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xe9, mfn=0xa463d >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xeb, mfn=0xa463b >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xed, mfn=0xa4639 >>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>> read-only memory page. gfn=0xef, mfn=0xa4637 >>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id >>> = 0x0a06, fault address = 0xc2c2c2c0 >>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>> id = 0x0700, fault address = 0xa90f8300 >>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>> id = 0x0700, fault address = 0xa90f8340 >>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>> id = 0x0700, fault address = 0xa90f8380 >>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>> id = 0x0700, fault address = 0xa90f83c0 >> >>> * they are just followed by the IO PAGE fault. Do you know where are >>> they from? Your video card driver maybe? >> >> From a HVM domain with a old (3.0.3) kernel, but the faults also occur without this domain being started. >> >> >>> Thanks, >>> Wei >> >> >>>> Complete xl dmesg and lspci -vvvknn attached. >>>> >>>> Thx >>>> >>>> -- >>>> Sander >> >> >> >> >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 09/07/2012 09:32 AM, Sander Eikelenboom wrote:> > Thursday, September 6, 2012, 5:03:05 PM, you wrote: > >> On 09/06/2012 03:50 PM, Sander Eikelenboom wrote: >>> >>> Thursday, September 6, 2012, 3:32:51 PM, you wrote: >>> >>>> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote: >>>>> >>>>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote: >>>>> >>>>>> Hi Jan, >>>>>> Attached patch dumps io page fault flags. The flags show the reason of >>>>>> the fault and tell us if this is an unmapped interrupt fault or a DMA fault. >>>>> >>>>>> Thanks, >>>>>> Wei >>>>> >>>>>> signed-off-by: Wei Wang<wei.wang2@amd.com> >>>>> >>>>> >>>>> I have applied the patch and the flags seem to differ between the faults: >>>>> >>>>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 >>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020 >>> >>>> OK, so they are not interrupt requests. I guess further information from >>>> your system would be helpful to debug this issue: >>>> 1) xl info >>>> 2) xl list >>>> 3) lscpi -vvv (NOTE: not in dom0 but in your guest) >>>> 4) cat /proc/iomem (in both dom0 and your hvm guest) >>> >>> dom14 is not a HVM guest,it''s a PV guest. > >> Ah, I see. PV guest is quite different than hvm, it does use p2m tables >> as io page tables. So no-sharept option does not work in this case. PV >> guests always use separated io page tables. There might be some >> incorrect mappings on the page tables. I will check this on my side. > > I have reverted the machine to xen-4.1.4-pre (changeset 23353) and kept everything else the same. > I haven''t seen any IO PAGE FAULTS after that. > > I did spot some differences in the output from lspci between xen 4.1 and 4.2, related to MSI enabled or not for the IOMMU device. > Have attached the xl/xm dmesg and lspci from booting with both versions. > > lspci: > > 00:00.2 Generic system peripheral [0806]: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] > Subsystem: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] > Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- > Latency: 0 > Interrupt: pin A routed to IRQ 10 > Capabilities: [40] Secure device<?> > 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+Eh... That is interesting. So which dom0 are you using? There is a c/s in 4.2 to prevent recent dom0 to disable iommu interrupt (changeset 25492:61844569a432) Otherwise, iommu cannot send any events including IO PAGE faults. You could try to revert dom0 to an old version like 2.6 pv_ops to see if you really have no io page faults on 4.1> 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Address: 00000000fee0100c Data: 4128 > Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+ > > Although it seems enabled, shouldn''t the IRQ number used be much higher than 10 for MSI interrupts ?The IRQ number is fine. MSI vector is stored at Data: 4128> > There is another difference in the bridge device that''s in front of the 0a:00.6 device that faults before the kernel is even booted. > > 00:03.0 PCI bridge [0604]: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port C) [1002:5a17] (prog-if 00 [Normal decode]) > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > 4.1: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- > 4.2: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort+<MAbort->SERR-<PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0 > I/O behind bridge: 0000f000-00000fff > Memory behind bridge: f9f00000-f9ffffff > Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff > 4.1: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort-<SERR-<PERR- > 4.2: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort+<TAbort-<MAbort-<SERR-<PERR- > BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort->Reset- FastB2B- > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > Capabilities: [50] Power Management version 3 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s<64ns, L1<1us > ExtTag+ RBE+ FLReset- > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 128 bytes > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- > LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0<1us, L1<8us > ClockPM- Surprise- LLActRep+ BwNot+ > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- > Slot #3, PowerLimit 10.000W; Interlock- NoCompl+ > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- > Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- > Changed: MRL- PresDet+ LinkState+The probably because of the IO_PAGE_FAULT. Thanks, Wei> serveerstertje:~# lspci -t > -[0000:00]-+-00.0 > +-00.2 > +-02.0-[0b]----00.0 > +-03.0-[0a]--+-00.0 > | +-00.1 > | +-00.2 > | +-00.3 > | +-00.4 > | +-00.5 > | +-00.6 > | \-00.7 > +-05.0-[09]----00.0 > +-06.0-[08]----00.0 > +-0a.0-[07]----00.0 > +-0b.0-[06]--+-00.0 > | \-00.1 > +-0c.0-[05]----00.0 > +-0d.0-[04]--+-00.0 > | +-00.1 > | +-00.2 > | +-00.3 > | +-00.4 > | +-00.5 > | +-00.6 > | \-00.7 > +-11.0 > +-12.0 > +-12.2 > +-13.0 > +-13.2 > +-14.0 > +-14.3 > +-14.4-[03]----06.0 > +-14.5 > +-15.0-[02]-- > +-16.0 > +-16.2 > +-18.0 > +-18.1 > +-18.2 > +-18.3 > \-18.4 > > > > > >> Thanks, >> Wei > >>> I will try to make a complete package, and try with one pv domain only where the devices are being passed through just to simplify the setup. >>> >>> >>>> * I would also like to know the symptoms of device 0x0700 when IO_PF >>>> happened. Did it stop working? >>> >>> Yes it stops working, the video capture just freezes, but the driver doesn''t bail out. >>> For the USB controller (0x0a06) it starts to give errors for usbdev_open in the guest. >>> >>>> (BTW: I copied a few options from your boot cmd line and it worked with >>>> my RD890 system >>> >>>> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps >>>> cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug >>>> apic=debug iommu=on,verbose,debug,no-sharept >>> >>>> * so, what OEM board you have?) >>> >>> MSI 890FXA-GD70 >>> >>>> Also from your log, these lines looks very strange: >>> >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xd5, mfn=0xa4a11 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xd7, mfn=0xa4a0f >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xd9, mfn=0xa4a0d >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xdb, mfn=0xa4a0b >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xdd, mfn=0xa4a09 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xdf, mfn=0xa4a07 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xe1, mfn=0xa4a05 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xe3, mfn=0xa4a03 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xe5, mfn=0xa4a01 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xe7, mfn=0xa463f >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xe9, mfn=0xa463d >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xeb, mfn=0xa463b >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xed, mfn=0xa4639 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xef, mfn=0xa4637 >>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id >>>> = 0x0a06, fault address = 0xc2c2c2c0 >>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>> id = 0x0700, fault address = 0xa90f8300 >>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>> id = 0x0700, fault address = 0xa90f8340 >>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>> id = 0x0700, fault address = 0xa90f8380 >>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>> id = 0x0700, fault address = 0xa90f83c0 >>> >>>> * they are just followed by the IO PAGE fault. Do you know where are >>>> they from? Your video card driver maybe? >>> >>> From a HVM domain with a old (3.0.3) kernel, but the faults also occur without this domain being started. >>> >>> >>>> Thanks, >>>> Wei >>> >>> >>>>> Complete xl dmesg and lspci -vvvknn attached. >>>>> >>>>> Thx >>>>> >>>>> -- >>>>> Sander >>> >>> >>> >>> >>> > >
Andrew Cooper
2012-Sep-07 09:17 UTC
Re: [PATCH] amd iommu: Dump flags of IO page faults (off topic - pci devices)
On 07/09/12 08:32, Sander Eikelenboom wrote:> Thursday, September 6, 2012, 5:03:05 PM, you wrote: > >> On 09/06/2012 03:50 PM, Sander Eikelenboom wrote: >>> Thursday, September 6, 2012, 3:32:51 PM, you wrote: >>> >>>> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote: >>>>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote: >>>>> >>>>>> Hi Jan, >>>>>> Attached patch dumps io page fault flags. The flags show the reason of >>>>>> the fault and tell us if this is an unmapped interrupt fault or a DMA fault. >>>>>> Thanks, >>>>>> Wei >>>>>> signed-off-by: Wei Wang<wei.wang2@amd.com> >>>>> >>>>> I have applied the patch and the flags seem to differ between the faults: >>>>> >>>>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 >>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020 >>>> OK, so they are not interrupt requests. I guess further information from >>>> your system would be helpful to debug this issue: >>>> 1) xl info >>>> 2) xl list >>>> 3) lscpi -vvv (NOTE: not in dom0 but in your guest) >>>> 4) cat /proc/iomem (in both dom0 and your hvm guest) >>> dom14 is not a HVM guest,it''s a PV guest. >> Ah, I see. PV guest is quite different than hvm, it does use p2m tables >> as io page tables. So no-sharept option does not work in this case. PV >> guests always use separated io page tables. There might be some >> incorrect mappings on the page tables. I will check this on my side. > I have reverted the machine to xen-4.1.4-pre (changeset 23353) and kept everything else the same. > I haven''t seen any IO PAGE FAULTS after that. > > I did spot some differences in the output from lspci between xen 4.1 and 4.2, related to MSI enabled or not for the IOMMU device. > Have attached the xl/xm dmesg and lspci from booting with both versions. > > lspci: > > 00:00.2 Generic system peripheral [0806]: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] > Subsystem: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] > Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0 > Interrupt: pin A routed to IRQ 10 > Capabilities: [40] Secure device <?> > 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+ > 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Address: 00000000fee0100c Data: 4128 > Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+ > > Although it seems enabled, shouldn''t the IRQ number used be much higher than 10 for MSI interrupts ?For compatibility reasons, all real PCI devices have to have the ability to fall back to legacy line level interrupts. This is the IRQ10 which you see, which is #INTA in perhaps more recognizable notation. The line interrupt(s) will only be used if MSI and MSI-x interrupts are disabled. You should find that all devices in lspci show between 1 and 4 #INTs (a thru d), with the exception of SRIOV virtual function which are specified to only support MSI/MSI-x ~Andrew> > There is another difference in the bridge device that''s in front of the 0a:00.6 device that faults before the kernel is even booted. > > 00:03.0 PCI bridge [0604]: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port C) [1002:5a17] (prog-if 00 [Normal decode]) > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ > 4.1: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > 4.2: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort+ <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0 > I/O behind bridge: 0000f000-00000fff > Memory behind bridge: f9f00000-f9ffffff > Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff > 4.1: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- > 4.2: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort+ <TAbort- <MAbort- <SERR- <PERR- > BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort- >Reset- FastB2B- > PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- > Capabilities: [50] Power Management version 3 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00 > DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us > ExtTag+ RBE+ FLReset- > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 128 bytes, MaxReadReq 128 bytes > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- > LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <1us, L1 <8us > ClockPM- Surprise- LLActRep+ BwNot+ > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- > SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- > Slot #3, PowerLimit 10.000W; Interlock- NoCompl+ > SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- > Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- > SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- > Changed: MRL- PresDet+ LinkState+ > > serveerstertje:~# lspci -t > -[0000:00]-+-00.0 > +-00.2 > +-02.0-[0b]----00.0 > +-03.0-[0a]--+-00.0 > | +-00.1 > | +-00.2 > | +-00.3 > | +-00.4 > | +-00.5 > | +-00.6 > | \-00.7 > +-05.0-[09]----00.0 > +-06.0-[08]----00.0 > +-0a.0-[07]----00.0 > +-0b.0-[06]--+-00.0 > | \-00.1 > +-0c.0-[05]----00.0 > +-0d.0-[04]--+-00.0 > | +-00.1 > | +-00.2 > | +-00.3 > | +-00.4 > | +-00.5 > | +-00.6 > | \-00.7 > +-11.0 > +-12.0 > +-12.2 > +-13.0 > +-13.2 > +-14.0 > +-14.3 > +-14.4-[03]----06.0 > +-14.5 > +-15.0-[02]-- > +-16.0 > +-16.2 > +-18.0 > +-18.1 > +-18.2 > +-18.3 > \-18.4 > > > > > >> Thanks, >> Wei >>> I will try to make a complete package, and try with one pv domain only where the devices are being passed through just to simplify the setup. >>> >>> >>>> * I would also like to know the symptoms of device 0x0700 when IO_PF >>>> happened. Did it stop working? >>> Yes it stops working, the video capture just freezes, but the driver doesn''t bail out. >>> For the USB controller (0x0a06) it starts to give errors for usbdev_open in the guest. >>> >>>> (BTW: I copied a few options from your boot cmd line and it worked with >>>> my RD890 system >>>> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps >>>> cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug >>>> apic=debug iommu=on,verbose,debug,no-sharept >>>> * so, what OEM board you have?) >>> MSI 890FXA-GD70 >>> >>>> Also from your log, these lines looks very strange: >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xd5, mfn=0xa4a11 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xd7, mfn=0xa4a0f >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xd9, mfn=0xa4a0d >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xdb, mfn=0xa4a0b >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xdd, mfn=0xa4a09 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xdf, mfn=0xa4a07 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xe1, mfn=0xa4a05 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xe3, mfn=0xa4a03 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xe5, mfn=0xa4a01 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xe7, mfn=0xa463f >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xe9, mfn=0xa463d >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xeb, mfn=0xa463b >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xed, mfn=0xa4639 >>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>> read-only memory page. gfn=0xef, mfn=0xa4637 >>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id >>>> = 0x0a06, fault address = 0xc2c2c2c0 >>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>> id = 0x0700, fault address = 0xa90f8300 >>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>> id = 0x0700, fault address = 0xa90f8340 >>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>> id = 0x0700, fault address = 0xa90f8380 >>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>> id = 0x0700, fault address = 0xa90f83c0 >>>> * they are just followed by the IO PAGE fault. Do you know where are >>>> they from? Your video card driver maybe? >>> From a HVM domain with a old (3.0.3) kernel, but the faults also occur without this domain being started. >>> >>> >>>> Thanks, >>>> Wei >>> >>>>> Complete xl dmesg and lspci -vvvknn attached. >>>>> >>>>> Thx >>>>> >>>>> -- >>>>> Sander >>> >>> >>> >>> >-- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
>>> On 07.09.12 at 09:32, Sander Eikelenboom <linux@eikelenboom.it> wrote: > 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+ > 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+No surprise you''re not seeing any faults on 4.1 - there''s no way they could get reported. I''m somewhat hesitant to pull the workaround patch from 4.2 into 4.1, as it''s wrong for the kernel to touch the MSI setting of the IOMMU (which is under the control of Xen) in the first place, but the kernel side patch I had submitted a while ago wasn''t received well. And that patch isn''t really small, and it would remain to be seen what other dependencies it would have... Jan
Sander Eikelenboom
2012-Sep-07 10:00 UTC
Re: [PATCH] amd iommu: Dump flags of IO page faults
Friday, September 7, 2012, 11:53:36 AM, you wrote:>>>> On 07.09.12 at 09:32, Sander Eikelenboom <linux@eikelenboom.it> wrote: >> 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+ >> 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+> No surprise you''re not seeing any faults on 4.1 - there''s no way > they could get reported. I''m somewhat hesitant to pull the > workaround patch from 4.2 into 4.1, as it''s wrong for the kernel > to touch the MSI setting of the IOMMU (which is under the > control of Xen) in the first place, but the kernel side patch I had > submitted a while ago wasn''t received well. And that patch isn''t > really small, and it would remain to be seen what other > dependencies it would have...Ok so that would mean that in the 4.1 case, the IOMMU is doing nothing ? Because if the IOMMU is working, i would say: a) isn''t it a bit strange that everything keeps working, although it should report the IO PAGE FAULT ? b) is the IO PAGE FAULT correct anyway, since the device keeps working fine ?> Jan
Sander Eikelenboom
2012-Sep-07 10:01 UTC
Re: [PATCH] amd iommu: Dump flags of IO page faults
Friday, September 7, 2012, 10:54:40 AM, you wrote:> On 09/07/2012 09:32 AM, Sander Eikelenboom wrote: >> >> Thursday, September 6, 2012, 5:03:05 PM, you wrote: >> >>> On 09/06/2012 03:50 PM, Sander Eikelenboom wrote: >>>> >>>> Thursday, September 6, 2012, 3:32:51 PM, you wrote: >>>> >>>>> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote: >>>>>> >>>>>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote: >>>>>> >>>>>>> Hi Jan, >>>>>>> Attached patch dumps io page fault flags. The flags show the reason of >>>>>>> the fault and tell us if this is an unmapped interrupt fault or a DMA fault. >>>>>> >>>>>>> Thanks, >>>>>>> Wei >>>>>> >>>>>>> signed-off-by: Wei Wang<wei.wang2@amd.com> >>>>>> >>>>>> >>>>>> I have applied the patch and the flags seem to differ between the faults: >>>>>> >>>>>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 >>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020 >>>> >>>>> OK, so they are not interrupt requests. I guess further information from >>>>> your system would be helpful to debug this issue: >>>>> 1) xl info >>>>> 2) xl list >>>>> 3) lscpi -vvv (NOTE: not in dom0 but in your guest) >>>>> 4) cat /proc/iomem (in both dom0 and your hvm guest) >>>> >>>> dom14 is not a HVM guest,it''s a PV guest. >> >>> Ah, I see. PV guest is quite different than hvm, it does use p2m tables >>> as io page tables. So no-sharept option does not work in this case. PV >>> guests always use separated io page tables. There might be some >>> incorrect mappings on the page tables. I will check this on my side. >> >> I have reverted the machine to xen-4.1.4-pre (changeset 23353) and kept everything else the same. >> I haven''t seen any IO PAGE FAULTS after that. >> >> I did spot some differences in the output from lspci between xen 4.1 and 4.2, related to MSI enabled or not for the IOMMU device. >> Have attached the xl/xm dmesg and lspci from booting with both versions. >> >> lspci: >> >> 00:00.2 Generic system peripheral [0806]: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] >> Subsystem: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] >> Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- >> Latency: 0 >> Interrupt: pin A routed to IRQ 10 >> Capabilities: [40] Secure device<?> >> 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+> Eh... That is interesting. So which dom0 are you using? There is a c/s > in 4.2 to prevent recent dom0 to disable iommu interrupt (changeset > 25492:61844569a432) Otherwise, iommu cannot send any events including IO > PAGE faults. You could try to revert dom0 to an old version like 2.6 > pv_ops to see if you really have no io page faults on 4.1Ok i will give that a try, only dom0 will have to be a 2.6 pv_ops i assume ?>> 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ >> Address: 00000000fee0100c Data: 4128 >> Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+ >> >> Although it seems enabled, shouldn''t the IRQ number used be much higher than 10 for MSI interrupts ?> The IRQ number is fine. MSI vector is stored at Data: 4128>> >> There is another difference in the bridge device that''s in front of the 0a:00.6 device that faults before the kernel is even booted. >> >> 00:03.0 PCI bridge [0604]: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port C) [1002:5a17] (prog-if 00 [Normal decode]) >> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ >> 4.1: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- >> 4.2: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort+<MAbort->SERR-<PERR- INTx- >> Latency: 0, Cache Line Size: 64 bytes >> Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0 >> I/O behind bridge: 0000f000-00000fff >> Memory behind bridge: f9f00000-f9ffffff >> Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff >> 4.1: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort-<SERR-<PERR- >> 4.2: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort+<TAbort-<MAbort-<SERR-<PERR- >> BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort->Reset- FastB2B- >> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- >> Capabilities: [50] Power Management version 3 >> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) >> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- >> Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00 >> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s<64ns, L1<1us >> ExtTag+ RBE+ FLReset- >> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- >> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ >> MaxPayload 128 bytes, MaxReadReq 128 bytes >> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- >> LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0<1us, L1<8us >> ClockPM- Surprise- LLActRep+ BwNot+ >> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- >> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- >> SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- >> Slot #3, PowerLimit 10.000W; Interlock- NoCompl+ >> SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- >> Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- >> SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- >> Changed: MRL- PresDet+ LinkState+> The probably because of the IO_PAGE_FAULT.> Thanks, > Wei>> serveerstertje:~# lspci -t >> -[0000:00]-+-00.0 >> +-00.2 >> +-02.0-[0b]----00.0 >> +-03.0-[0a]--+-00.0 >> | +-00.1 >> | +-00.2 >> | +-00.3 >> | +-00.4 >> | +-00.5 >> | +-00.6 >> | \-00.7 >> +-05.0-[09]----00.0 >> +-06.0-[08]----00.0 >> +-0a.0-[07]----00.0 >> +-0b.0-[06]--+-00.0 >> | \-00.1 >> +-0c.0-[05]----00.0 >> +-0d.0-[04]--+-00.0 >> | +-00.1 >> | +-00.2 >> | +-00.3 >> | +-00.4 >> | +-00.5 >> | +-00.6 >> | \-00.7 >> +-11.0 >> +-12.0 >> +-12.2 >> +-13.0 >> +-13.2 >> +-14.0 >> +-14.3 >> +-14.4-[03]----06.0 >> +-14.5 >> +-15.0-[02]-- >> +-16.0 >> +-16.2 >> +-18.0 >> +-18.1 >> +-18.2 >> +-18.3 >> \-18.4 >> >> >> >> >> >>> Thanks, >>> Wei >> >>>> I will try to make a complete package, and try with one pv domain only where the devices are being passed through just to simplify the setup. >>>> >>>> >>>>> * I would also like to know the symptoms of device 0x0700 when IO_PF >>>>> happened. Did it stop working? >>>> >>>> Yes it stops working, the video capture just freezes, but the driver doesn''t bail out. >>>> For the USB controller (0x0a06) it starts to give errors for usbdev_open in the guest. >>>> >>>>> (BTW: I copied a few options from your boot cmd line and it worked with >>>>> my RD890 system >>>> >>>>> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps >>>>> cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug >>>>> apic=debug iommu=on,verbose,debug,no-sharept >>>> >>>>> * so, what OEM board you have?) >>>> >>>> MSI 890FXA-GD70 >>>> >>>>> Also from your log, these lines looks very strange: >>>> >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xd5, mfn=0xa4a11 >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xd7, mfn=0xa4a0f >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xd9, mfn=0xa4a0d >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xdb, mfn=0xa4a0b >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xdd, mfn=0xa4a09 >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xdf, mfn=0xa4a07 >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xe1, mfn=0xa4a05 >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xe3, mfn=0xa4a03 >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xe5, mfn=0xa4a01 >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xe7, mfn=0xa463f >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xe9, mfn=0xa463d >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xeb, mfn=0xa463b >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xed, mfn=0xa4639 >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xef, mfn=0xa4637 >>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id >>>>> = 0x0a06, fault address = 0xc2c2c2c0 >>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>> id = 0x0700, fault address = 0xa90f8300 >>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>> id = 0x0700, fault address = 0xa90f8340 >>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>> id = 0x0700, fault address = 0xa90f8380 >>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>> id = 0x0700, fault address = 0xa90f83c0 >>>> >>>>> * they are just followed by the IO PAGE fault. Do you know where are >>>>> they from? Your video card driver maybe? >>>> >>>> From a HVM domain with a old (3.0.3) kernel, but the faults also occur without this domain being started. >>>> >>>> >>>>> Thanks, >>>>> Wei >>>> >>>> >>>>>> Complete xl dmesg and lspci -vvvknn attached. >>>>>> >>>>>> Thx >>>>>> >>>>>> -- >>>>>> Sander >>>> >>>> >>>> >>>> >>>> >> >>
>>> On 07.09.12 at 12:00, Sander Eikelenboom <linux@eikelenboom.it> wrote:> Friday, September 7, 2012, 11:53:36 AM, you wrote: > >>>>> On 07.09.12 at 09:32, Sander Eikelenboom <linux@eikelenboom.it> wrote: >>> 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+ >>> 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ > >> No surprise you''re not seeing any faults on 4.1 - there''s no way >> they could get reported. I''m somewhat hesitant to pull the >> workaround patch from 4.2 into 4.1, as it''s wrong for the kernel >> to touch the MSI setting of the IOMMU (which is under the >> control of Xen) in the first place, but the kernel side patch I had >> submitted a while ago wasn''t received well. And that patch isn''t >> really small, and it would remain to be seen what other >> dependencies it would have... > > Ok so that would mean that in the 4.1 case, the IOMMU is doing nothing ?No, it just can''t report faults (they would need to be polled for). Also, saying "in the 4.1 case" is wrong here - this really depends on whether you have an affected Dom0 kernel. Things work fine if the Dom0 kernel doesn''t trample over Xen''s setup. Jan
Sander Eikelenboom
2012-Sep-07 10:15 UTC
Re: [PATCH] amd iommu: Dump flags of IO page faults
Friday, September 7, 2012, 12:06:31 PM, you wrote:>>>> On 07.09.12 at 12:00, Sander Eikelenboom <linux@eikelenboom.it> wrote:>> Friday, September 7, 2012, 11:53:36 AM, you wrote: >> >>>>>> On 07.09.12 at 09:32, Sander Eikelenboom <linux@eikelenboom.it> wrote: >>>> 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+ >>>> 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ >> >>> No surprise you''re not seeing any faults on 4.1 - there''s no way >>> they could get reported. I''m somewhat hesitant to pull the >>> workaround patch from 4.2 into 4.1, as it''s wrong for the kernel >>> to touch the MSI setting of the IOMMU (which is under the >>> control of Xen) in the first place, but the kernel side patch I had >>> submitted a while ago wasn''t received well. And that patch isn''t >>> really small, and it would remain to be seen what other >>> dependencies it would have... >> >> Ok so that would mean that in the 4.1 case, the IOMMU is doing nothing ?> No, it just can''t report faults (they would need to be polled for). > Also, saying "in the 4.1 case" is wrong here - this really depends > on whether you have an affected Dom0 kernel. Things work fine > if the Dom0 kernel doesn''t trample over Xen''s setup.Except for the IO PAGE FAULT in xl dmesg, which is reported before the kernel even gets loaded with xen-4.2. I don''t see that one on 4.1 either, so that wouldn''t add up to it being a dom0 kernel problem only ...> Jan
>>> On 07.09.12 at 12:15, Sander Eikelenboom <linux@eikelenboom.it> wrote: >> No, it just can''t report faults (they would need to be polled for). >> Also, saying "in the 4.1 case" is wrong here - this really depends >> on whether you have an affected Dom0 kernel. Things work fine >> if the Dom0 kernel doesn''t trample over Xen''s setup. > > Except for the IO PAGE FAULT in xl dmesg, which is reported before the > kernel even gets loaded with xen-4.2. > I don''t see that one on 4.1 either, so that wouldn''t add up to it being a > dom0 kernel problem only ...Oh, yes, that certainly is a hypervisor only bug (if a bug under our control at all - as said before, a babbling device - eg left active by BIOS - is likely beyond our control). Jan
>>> On 07.09.12 at 12:01, Sander Eikelenboom <linux@eikelenboom.it> wrote: >>> 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+ > >> Eh... That is interesting. So which dom0 are you using? There is a c/s >> in 4.2 to prevent recent dom0 to disable iommu interrupt (changeset >> 25492:61844569a432) Otherwise, iommu cannot send any events including IO >> PAGE faults. You could try to revert dom0 to an old version like 2.6 >> pv_ops to see if you really have no io page faults on 4.1 > > Ok i will give that a try, only dom0 will have to be a 2.6 pv_ops i assume ?You could also drop the patch below into a kernel that has the problematic change. Will require #define PCI_CLASS_SYSTEM_IOMMU 0x0806 to be added at a suitable spot in include/linux/pci_ids.h. Jan --- head.orig/drivers/pci/msi.c +++ head/drivers/pci/msi.c @@ -20,6 +20,7 @@ #include <linux/errno.h> #include <linux/io.h> #include <linux/slab.h> +#include <xen/xen.h> #include "pci.h" #include "msi.h" @@ -1022,7 +1023,13 @@ void pci_msi_init_pci_dev(struct pci_dev /* Disable the msi hardware to avoid screaming interrupts * during boot. This is the power on reset default so * usually this should be a noop. + * But on a Xen host don''t do this for IOMMUs which the hypervisor + * is in control of (and hence has already enabled on purpose). */ + if (xen_initial_domain() + && (dev->class >> 8) == PCI_CLASS_SYSTEM_IOMMU + && dev->vendor == PCI_VENDOR_ID_AMD) + return; pos = pci_find_capability(dev, PCI_CAP_ID_MSI); if (pos) msi_set_enable(dev, pos, 0);
Konrad Rzeszutek Wilk
2012-Sep-07 20:51 UTC
Re: [PATCH] amd iommu: Dump flags of IO page faults
On Fri, Sep 07, 2012 at 12:01:33PM +0200, Sander Eikelenboom wrote:> > Friday, September 7, 2012, 10:54:40 AM, you wrote: > > > On 09/07/2012 09:32 AM, Sander Eikelenboom wrote: > >> > >> Thursday, September 6, 2012, 5:03:05 PM, you wrote: > >> > >>> On 09/06/2012 03:50 PM, Sander Eikelenboom wrote: > >>>> > >>>> Thursday, September 6, 2012, 3:32:51 PM, you wrote: > >>>> > >>>>> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote: > >>>>>> > >>>>>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote: > >>>>>> > >>>>>>> Hi Jan, > >>>>>>> Attached patch dumps io page fault flags. The flags show the reason of > >>>>>>> the fault and tell us if this is an unmapped interrupt fault or a DMA fault. > >>>>>> > >>>>>>> Thanks, > >>>>>>> Wei > >>>>>> > >>>>>>> signed-off-by: Wei Wang<wei.wang2@amd.com> > >>>>>> > >>>>>> > >>>>>> I have applied the patch and the flags seem to differ between the faults: > >>>>>> > >>>>>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 > >>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 > >>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 > >>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020 > >>>> > >>>>> OK, so they are not interrupt requests. I guess further information from > >>>>> your system would be helpful to debug this issue: > >>>>> 1) xl info > >>>>> 2) xl list > >>>>> 3) lscpi -vvv (NOTE: not in dom0 but in your guest) > >>>>> 4) cat /proc/iomem (in both dom0 and your hvm guest) > >>>> > >>>> dom14 is not a HVM guest,it''s a PV guest. > >> > >>> Ah, I see. PV guest is quite different than hvm, it does use p2m tables > >>> as io page tables. So no-sharept option does not work in this case. PV > >>> guests always use separated io page tables. There might be some > >>> incorrect mappings on the page tables. I will check this on my side. > >> > >> I have reverted the machine to xen-4.1.4-pre (changeset 23353) and kept everything else the same. > >> I haven''t seen any IO PAGE FAULTS after that. > >> > >> I did spot some differences in the output from lspci between xen 4.1 and 4.2, related to MSI enabled or not for the IOMMU device. > >> Have attached the xl/xm dmesg and lspci from booting with both versions. > >> > >> lspci: > >> > >> 00:00.2 Generic system peripheral [0806]: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] > >> Subsystem: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] > >> Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- > >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- > >> Latency: 0 > >> Interrupt: pin A routed to IRQ 10 > >> Capabilities: [40] Secure device<?> > >> 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+ > > > Eh... That is interesting. So which dom0 are you using? There is a c/s > > in 4.2 to prevent recent dom0 to disable iommu interrupt (changeset > > 25492:61844569a432) Otherwise, iommu cannot send any events including IO > > PAGE faults. You could try to revert dom0 to an old version like 2.6 > > pv_ops to see if you really have no io page faults on 4.1 > > Ok i will give that a try, only dom0 will have to be a 2.6 pv_ops i assume ? >So the failure they are describing is due to: http://lists.xen.org/archives/html/xen-devel/2012-06/msg00668.html Or you can use the patch that Jan posted http://lists.xen.org/archives/html/xen-devel/2012-06/msg01196.html and use the existing kernel.. But more interesting - is this device (00:00.2) in the Xen-pciback.hide arguments (if not, then don''t worry)?
Sander Eikelenboom
2012-Sep-24 08:38 UTC
Re: [PATCH] amd iommu: Dump flags of IO page faults
Friday, September 7, 2012, 10:54:40 AM, you wrote:> On 09/07/2012 09:32 AM, Sander Eikelenboom wrote: >> >> Thursday, September 6, 2012, 5:03:05 PM, you wrote: >> >>> On 09/06/2012 03:50 PM, Sander Eikelenboom wrote: >>>> >>>> Thursday, September 6, 2012, 3:32:51 PM, you wrote: >>>> >>>>> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote: >>>>>> >>>>>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote: >>>>>> >>>>>>> Hi Jan, >>>>>>> Attached patch dumps io page fault flags. The flags show the reason of >>>>>>> the fault and tell us if this is an unmapped interrupt fault or a DMA fault. >>>>>> >>>>>>> Thanks, >>>>>>> Wei >>>>>> >>>>>>> signed-off-by: Wei Wang<wei.wang2@amd.com> >>>>>> >>>>>> >>>>>> I have applied the patch and the flags seem to differ between the faults: >>>>>> >>>>>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 >>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020 >>>> >>>>> OK, so they are not interrupt requests. I guess further information from >>>>> your system would be helpful to debug this issue: >>>>> 1) xl info >>>>> 2) xl list >>>>> 3) lscpi -vvv (NOTE: not in dom0 but in your guest) >>>>> 4) cat /proc/iomem (in both dom0 and your hvm guest) >>>> >>>> dom14 is not a HVM guest,it''s a PV guest. >> >>> Ah, I see. PV guest is quite different than hvm, it does use p2m tables >>> as io page tables. So no-sharept option does not work in this case. PV >>> guests always use separated io page tables. There might be some >>> incorrect mappings on the page tables. I will check this on my side. >> >> I have reverted the machine to xen-4.1.4-pre (changeset 23353) and kept everything else the same. >> I haven''t seen any IO PAGE FAULTS after that. >> >> I did spot some differences in the output from lspci between xen 4.1 and 4.2, related to MSI enabled or not for the IOMMU device. >> Have attached the xl/xm dmesg and lspci from booting with both versions. >> >> lspci: >> >> 00:00.2 Generic system peripheral [0806]: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] >> Subsystem: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] >> Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- >> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- >> Latency: 0 >> Interrupt: pin A routed to IRQ 10 >> Capabilities: [40] Secure device<?> >> 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+> Eh... That is interesting. So which dom0 are you using? There is a c/s > in 4.2 to prevent recent dom0 to disable iommu interrupt (changeset > 25492:61844569a432) Otherwise, iommu cannot send any events including IO > PAGE faults. You could try to revert dom0 to an old version like 2.6 > pv_ops to see if you really have no io page faults on 4.1Ok i finally got the time to do some more testing, tested 4.2 around that changeset, and made a copy of the guest using HVM instead of PV. The results: - On xen-4.1.* and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine, both in a HVM as a PV guest, i don''t see IO page faults getting reported. - On xen-4.2 changeset < 25492 and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine, both in a HVM as a PV guest, i don''t see IO page faults getting reported. - On xen-4.2 changeset > 25492 and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine for a short while (around 5 to 10 minutes) in a PV guest, after that IO page faults get reported and the video freezes, i don''t see any errors in the guest though. - On xen-unstable tip and a 3.6-rc6 kernel (dom0 and domU): PV: the video device passed through works fine for a short while (around 5 to 10 minutes), after that IO page faults get reported and the video freezes, i don''t see any errors in the guest though. HVM: the video device passed through doesn''t work from the start: - The device is there according to lspci - The video application start fine, but delivers a green image, so the device is not working properly. I don''t see IO page faults though. Attached are (all with xen-unstable tip and the guest as HVM (domain 15): - xl dmesg - Patch which adds some more info, but all values reported seem to be zero (see xl dmesg) - lspci dom0 - lspci HVM guest>> 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ >> Address: 00000000fee0100c Data: 4128 >> Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+ >> >> Although it seems enabled, shouldn''t the IRQ number used be much higher than 10 for MSI interrupts ?> The IRQ number is fine. MSI vector is stored at Data: 4128>> >> There is another difference in the bridge device that''s in front of the 0a:00.6 device that faults before the kernel is even booted. >> >> 00:03.0 PCI bridge [0604]: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port C) [1002:5a17] (prog-if 00 [Normal decode]) >> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ >> 4.1: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- >> 4.2: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort+<MAbort->SERR-<PERR- INTx- >> Latency: 0, Cache Line Size: 64 bytes >> Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0 >> I/O behind bridge: 0000f000-00000fff >> Memory behind bridge: f9f00000-f9ffffff >> Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff >> 4.1: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort-<SERR-<PERR- >> 4.2: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort+<TAbort-<MAbort-<SERR-<PERR- >> BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort->Reset- FastB2B- >> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- >> Capabilities: [50] Power Management version 3 >> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) >> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- >> Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00 >> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s<64ns, L1<1us >> ExtTag+ RBE+ FLReset- >> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- >> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ >> MaxPayload 128 bytes, MaxReadReq 128 bytes >> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- >> LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0<1us, L1<8us >> ClockPM- Surprise- LLActRep+ BwNot+ >> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- >> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- >> SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- >> Slot #3, PowerLimit 10.000W; Interlock- NoCompl+ >> SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- >> Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- >> SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- >> Changed: MRL- PresDet+ LinkState+> The probably because of the IO_PAGE_FAULT.> Thanks, > Wei>> serveerstertje:~# lspci -t >> -[0000:00]-+-00.0 >> +-00.2 >> +-02.0-[0b]----00.0 >> +-03.0-[0a]--+-00.0 >> | +-00.1 >> | +-00.2 >> | +-00.3 >> | +-00.4 >> | +-00.5 >> | +-00.6 >> | \-00.7 >> +-05.0-[09]----00.0 >> +-06.0-[08]----00.0 >> +-0a.0-[07]----00.0 >> +-0b.0-[06]--+-00.0 >> | \-00.1 >> +-0c.0-[05]----00.0 >> +-0d.0-[04]--+-00.0 >> | +-00.1 >> | +-00.2 >> | +-00.3 >> | +-00.4 >> | +-00.5 >> | +-00.6 >> | \-00.7 >> +-11.0 >> +-12.0 >> +-12.2 >> +-13.0 >> +-13.2 >> +-14.0 >> +-14.3 >> +-14.4-[03]----06.0 >> +-14.5 >> +-15.0-[02]-- >> +-16.0 >> +-16.2 >> +-18.0 >> +-18.1 >> +-18.2 >> +-18.3 >> \-18.4 >> >> >> >> >> >>> Thanks, >>> Wei >> >>>> I will try to make a complete package, and try with one pv domain only where the devices are being passed through just to simplify the setup. >>>> >>>> >>>>> * I would also like to know the symptoms of device 0x0700 when IO_PF >>>>> happened. Did it stop working? >>>> >>>> Yes it stops working, the video capture just freezes, but the driver doesn''t bail out. >>>> For the USB controller (0x0a06) it starts to give errors for usbdev_open in the guest. >>>> >>>>> (BTW: I copied a few options from your boot cmd line and it worked with >>>>> my RD890 system >>>> >>>>> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps >>>>> cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug >>>>> apic=debug iommu=on,verbose,debug,no-sharept >>>> >>>>> * so, what OEM board you have?) >>>> >>>> MSI 890FXA-GD70 >>>> >>>>> Also from your log, these lines looks very strange: >>>> >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xd5, mfn=0xa4a11 >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xd7, mfn=0xa4a0f >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xd9, mfn=0xa4a0d >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xdb, mfn=0xa4a0b >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xdd, mfn=0xa4a09 >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xdf, mfn=0xa4a07 >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xe1, mfn=0xa4a05 >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xe3, mfn=0xa4a03 >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xe5, mfn=0xa4a01 >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xe7, mfn=0xa463f >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xe9, mfn=0xa463d >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xeb, mfn=0xa463b >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xed, mfn=0xa4639 >>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>> read-only memory page. gfn=0xef, mfn=0xa4637 >>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id >>>>> = 0x0a06, fault address = 0xc2c2c2c0 >>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>> id = 0x0700, fault address = 0xa90f8300 >>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>> id = 0x0700, fault address = 0xa90f8340 >>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>> id = 0x0700, fault address = 0xa90f8380 >>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>> id = 0x0700, fault address = 0xa90f83c0 >>>> >>>>> * they are just followed by the IO PAGE fault. Do you know where are >>>>> they from? Your video card driver maybe? >>>> >>>> From a HVM domain with a old (3.0.3) kernel, but the faults also occur without this domain being started. >>>> >>>> >>>>> Thanks, >>>>> Wei >>>> >>>> >>>>>> Complete xl dmesg and lspci -vvvknn attached. >>>>>> >>>>>> Thx >>>>>> >>>>>> -- >>>>>> Sander >>>> >>>> >>>> >>>> >>>> >> >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 09/24/2012 10:38 AM, Sander Eikelenboom wrote:> > Friday, September 7, 2012, 10:54:40 AM, you wrote: > >> On 09/07/2012 09:32 AM, Sander Eikelenboom wrote: >>> >>> Thursday, September 6, 2012, 5:03:05 PM, you wrote: >>> >>>> On 09/06/2012 03:50 PM, Sander Eikelenboom wrote: >>>>> >>>>> Thursday, September 6, 2012, 3:32:51 PM, you wrote: >>>>> >>>>>> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote: >>>>>>> >>>>>>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote: >>>>>>> >>>>>>>> Hi Jan, >>>>>>>> Attached patch dumps io page fault flags. The flags show the reason of >>>>>>>> the fault and tell us if this is an unmapped interrupt fault or a DMA fault. >>>>>>> >>>>>>>> Thanks, >>>>>>>> Wei >>>>>>> >>>>>>>> signed-off-by: Wei Wang<wei.wang2@amd.com> >>>>>>> >>>>>>> >>>>>>> I have applied the patch and the flags seem to differ between the faults: >>>>>>> >>>>>>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 >>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020 >>>>> >>>>>> OK, so they are not interrupt requests. I guess further information from >>>>>> your system would be helpful to debug this issue: >>>>>> 1) xl info >>>>>> 2) xl list >>>>>> 3) lscpi -vvv (NOTE: not in dom0 but in your guest) >>>>>> 4) cat /proc/iomem (in both dom0 and your hvm guest) >>>>> >>>>> dom14 is not a HVM guest,it''s a PV guest. >>> >>>> Ah, I see. PV guest is quite different than hvm, it does use p2m tables >>>> as io page tables. So no-sharept option does not work in this case. PV >>>> guests always use separated io page tables. There might be some >>>> incorrect mappings on the page tables. I will check this on my side. >>> >>> I have reverted the machine to xen-4.1.4-pre (changeset 23353) and kept everything else the same. >>> I haven''t seen any IO PAGE FAULTS after that. >>> >>> I did spot some differences in the output from lspci between xen 4.1 and 4.2, related to MSI enabled or not for the IOMMU device. >>> Have attached the xl/xm dmesg and lspci from booting with both versions. >>> >>> lspci: >>> >>> 00:00.2 Generic system peripheral [0806]: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] >>> Subsystem: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] >>> Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- >>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- >>> Latency: 0 >>> Interrupt: pin A routed to IRQ 10 >>> Capabilities: [40] Secure device<?> >>> 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+ > >> Eh... That is interesting. So which dom0 are you using? There is a c/s >> in 4.2 to prevent recent dom0 to disable iommu interrupt (changeset >> 25492:61844569a432) Otherwise, iommu cannot send any events including IO >> PAGE faults. You could try to revert dom0 to an old version like 2.6 >> pv_ops to see if you really have no io page faults on 4.1 > > Ok i finally got the time to do some more testing, tested 4.2 around that changeset, and made a copy of the guest using HVM instead of PV. > > The results: > - On xen-4.1.* and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine, both in a HVM as a PV guest, i don''t see IO page faults getting reported. > - On xen-4.2 changeset< 25492 and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine, both in a HVM as a PV guest, i don''t see IO page faults getting reported. > - On xen-4.2 changeset> 25492 and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine for a short while (around 5 to 10 minutes) in a PV guest, after that IO page faults get reported and the video freezes, i don''t see any errors in the guest though. > - On xen-unstable tip and a 3.6-rc6 kernel (dom0 and domU): > PV: the video device passed through works fine for a short while (around 5 to 10 minutes), after that IO page faults get reported and the video freezes, i don''t see any errors in the guest though. > HVM: the video device passed through doesn''t work from the start: > - The device is there according to lspci > - The video application start fine, but delivers a green image, so the device is not working properly. I don''t see IO page faults though. > > Attached are (all with xen-unstable tip and the guest as HVM (domain 15): > - xl dmesg > - Patch which adds some more info, but all values reported seem to be zero (see xl dmesg) > - lspci dom0 > - lspci HVM guestHI, Thanks for the information, very very helpful for debugging. I hope I could start to look at this right after sending my next iommu patch queue upstream...another question is: Did you see this issue on a single pv/hvm guest system or you only saw it on a system with about 16 running VMs? Thanks, Wei> > > >>> 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ >>> Address: 00000000fee0100c Data: 4128 >>> Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+ >>> >>> Although it seems enabled, shouldn''t the IRQ number used be much higher than 10 for MSI interrupts ? > >> The IRQ number is fine. MSI vector is stored at Data: 4128 > >>> >>> There is another difference in the bridge device that''s in front of the 0a:00.6 device that faults before the kernel is even booted. >>> >>> 00:03.0 PCI bridge [0604]: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port C) [1002:5a17] (prog-if 00 [Normal decode]) >>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ >>> 4.1: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- >>> 4.2: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort+<MAbort->SERR-<PERR- INTx- >>> Latency: 0, Cache Line Size: 64 bytes >>> Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0 >>> I/O behind bridge: 0000f000-00000fff >>> Memory behind bridge: f9f00000-f9ffffff >>> Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff >>> 4.1: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort-<SERR-<PERR- >>> 4.2: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort+<TAbort-<MAbort-<SERR-<PERR- >>> BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort->Reset- FastB2B- >>> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- >>> Capabilities: [50] Power Management version 3 >>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) >>> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- >>> Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00 >>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s<64ns, L1<1us >>> ExtTag+ RBE+ FLReset- >>> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- >>> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ >>> MaxPayload 128 bytes, MaxReadReq 128 bytes >>> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- >>> LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0<1us, L1<8us >>> ClockPM- Surprise- LLActRep+ BwNot+ >>> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- >>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >>> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- >>> SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- >>> Slot #3, PowerLimit 10.000W; Interlock- NoCompl+ >>> SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- >>> Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- >>> SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- >>> Changed: MRL- PresDet+ LinkState+ > >> The probably because of the IO_PAGE_FAULT. > >> Thanks, >> Wei > >>> serveerstertje:~# lspci -t >>> -[0000:00]-+-00.0 >>> +-00.2 >>> +-02.0-[0b]----00.0 >>> +-03.0-[0a]--+-00.0 >>> | +-00.1 >>> | +-00.2 >>> | +-00.3 >>> | +-00.4 >>> | +-00.5 >>> | +-00.6 >>> | \-00.7 >>> +-05.0-[09]----00.0 >>> +-06.0-[08]----00.0 >>> +-0a.0-[07]----00.0 >>> +-0b.0-[06]--+-00.0 >>> | \-00.1 >>> +-0c.0-[05]----00.0 >>> +-0d.0-[04]--+-00.0 >>> | +-00.1 >>> | +-00.2 >>> | +-00.3 >>> | +-00.4 >>> | +-00.5 >>> | +-00.6 >>> | \-00.7 >>> +-11.0 >>> +-12.0 >>> +-12.2 >>> +-13.0 >>> +-13.2 >>> +-14.0 >>> +-14.3 >>> +-14.4-[03]----06.0 >>> +-14.5 >>> +-15.0-[02]-- >>> +-16.0 >>> +-16.2 >>> +-18.0 >>> +-18.1 >>> +-18.2 >>> +-18.3 >>> \-18.4 >>> >>> >>> >>> >>> >>>> Thanks, >>>> Wei >>> >>>>> I will try to make a complete package, and try with one pv domain only where the devices are being passed through just to simplify the setup. >>>>> >>>>> >>>>>> * I would also like to know the symptoms of device 0x0700 when IO_PF >>>>>> happened. Did it stop working? >>>>> >>>>> Yes it stops working, the video capture just freezes, but the driver doesn''t bail out. >>>>> For the USB controller (0x0a06) it starts to give errors for usbdev_open in the guest. >>>>> >>>>>> (BTW: I copied a few options from your boot cmd line and it worked with >>>>>> my RD890 system >>>>> >>>>>> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps >>>>>> cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug >>>>>> apic=debug iommu=on,verbose,debug,no-sharept >>>>> >>>>>> * so, what OEM board you have?) >>>>> >>>>> MSI 890FXA-GD70 >>>>> >>>>>> Also from your log, these lines looks very strange: >>>>> >>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>> read-only memory page. gfn=0xd5, mfn=0xa4a11 >>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>> read-only memory page. gfn=0xd7, mfn=0xa4a0f >>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>> read-only memory page. gfn=0xd9, mfn=0xa4a0d >>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>> read-only memory page. gfn=0xdb, mfn=0xa4a0b >>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>> read-only memory page. gfn=0xdd, mfn=0xa4a09 >>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>> read-only memory page. gfn=0xdf, mfn=0xa4a07 >>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>> read-only memory page. gfn=0xe1, mfn=0xa4a05 >>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>> read-only memory page. gfn=0xe3, mfn=0xa4a03 >>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>> read-only memory page. gfn=0xe5, mfn=0xa4a01 >>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>> read-only memory page. gfn=0xe7, mfn=0xa463f >>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>> read-only memory page. gfn=0xe9, mfn=0xa463d >>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>> read-only memory page. gfn=0xeb, mfn=0xa463b >>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>> read-only memory page. gfn=0xed, mfn=0xa4639 >>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>> read-only memory page. gfn=0xef, mfn=0xa4637 >>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id >>>>>> = 0x0a06, fault address = 0xc2c2c2c0 >>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>> id = 0x0700, fault address = 0xa90f8300 >>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>> id = 0x0700, fault address = 0xa90f8340 >>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>> id = 0x0700, fault address = 0xa90f8380 >>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>> id = 0x0700, fault address = 0xa90f83c0 >>>>> >>>>>> * they are just followed by the IO PAGE fault. Do you know where are >>>>>> they from? Your video card driver maybe? >>>>> >>>>> From a HVM domain with a old (3.0.3) kernel, but the faults also occur without this domain being started. >>>>> >>>>> >>>>>> Thanks, >>>>>> Wei >>>>> >>>>> >>>>>>> Complete xl dmesg and lspci -vvvknn attached. >>>>>>> >>>>>>> Thx >>>>>>> >>>>>>> -- >>>>>>> Sander >>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> > >
Sander Eikelenboom
2012-Sep-24 12:27 UTC
Re: [PATCH] amd iommu: Dump flags of IO page faults
Monday, September 24, 2012, 2:24:16 PM, you wrote:> On 09/24/2012 10:38 AM, Sander Eikelenboom wrote: >> >> Friday, September 7, 2012, 10:54:40 AM, you wrote: >> >>> On 09/07/2012 09:32 AM, Sander Eikelenboom wrote: >>>> >>>> Thursday, September 6, 2012, 5:03:05 PM, you wrote: >>>> >>>>> On 09/06/2012 03:50 PM, Sander Eikelenboom wrote: >>>>>> >>>>>> Thursday, September 6, 2012, 3:32:51 PM, you wrote: >>>>>> >>>>>>> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote: >>>>>>>> >>>>>>>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote: >>>>>>>> >>>>>>>>> Hi Jan, >>>>>>>>> Attached patch dumps io page fault flags. The flags show the reason of >>>>>>>>> the fault and tell us if this is an unmapped interrupt fault or a DMA fault. >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Wei >>>>>>>> >>>>>>>>> signed-off-by: Wei Wang<wei.wang2@amd.com> >>>>>>>> >>>>>>>> >>>>>>>> I have applied the patch and the flags seem to differ between the faults: >>>>>>>> >>>>>>>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 >>>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020 >>>>>> >>>>>>> OK, so they are not interrupt requests. I guess further information from >>>>>>> your system would be helpful to debug this issue: >>>>>>> 1) xl info >>>>>>> 2) xl list >>>>>>> 3) lscpi -vvv (NOTE: not in dom0 but in your guest) >>>>>>> 4) cat /proc/iomem (in both dom0 and your hvm guest) >>>>>> >>>>>> dom14 is not a HVM guest,it''s a PV guest. >>>> >>>>> Ah, I see. PV guest is quite different than hvm, it does use p2m tables >>>>> as io page tables. So no-sharept option does not work in this case. PV >>>>> guests always use separated io page tables. There might be some >>>>> incorrect mappings on the page tables. I will check this on my side. >>>> >>>> I have reverted the machine to xen-4.1.4-pre (changeset 23353) and kept everything else the same. >>>> I haven''t seen any IO PAGE FAULTS after that. >>>> >>>> I did spot some differences in the output from lspci between xen 4.1 and 4.2, related to MSI enabled or not for the IOMMU device. >>>> Have attached the xl/xm dmesg and lspci from booting with both versions. >>>> >>>> lspci: >>>> >>>> 00:00.2 Generic system peripheral [0806]: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] >>>> Subsystem: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] >>>> Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- >>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- >>>> Latency: 0 >>>> Interrupt: pin A routed to IRQ 10 >>>> Capabilities: [40] Secure device<?> >>>> 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+ >> >>> Eh... That is interesting. So which dom0 are you using? There is a c/s >>> in 4.2 to prevent recent dom0 to disable iommu interrupt (changeset >>> 25492:61844569a432) Otherwise, iommu cannot send any events including IO >>> PAGE faults. You could try to revert dom0 to an old version like 2.6 >>> pv_ops to see if you really have no io page faults on 4.1 >> >> Ok i finally got the time to do some more testing, tested 4.2 around that changeset, and made a copy of the guest using HVM instead of PV. >> >> The results: >> - On xen-4.1.* and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine, both in a HVM as a PV guest, i don''t see IO page faults getting reported. >> - On xen-4.2 changeset< 25492 and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine, both in a HVM as a PV guest, i don''t see IO page faults getting reported. >> - On xen-4.2 changeset> 25492 and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine for a short while (around 5 to 10 minutes) in a PV guest, after that IO page faults get reported and the video freezes, i don''t see any errors in the guest though. >> - On xen-unstable tip and a 3.6-rc6 kernel (dom0 and domU): >> PV: the video device passed through works fine for a short while (around 5 to 10 minutes), after that IO page faults get reported and the video freezes, i don''t see any errors in the guest though. >> HVM: the video device passed through doesn''t work from the start: >> - The device is there according to lspci >> - The video application start fine, but delivers a green image, so the device is not working properly. I don''t see IO page faults though. >> >> Attached are (all with xen-unstable tip and the guest as HVM (domain 15): >> - xl dmesg >> - Patch which adds some more info, but all values reported seem to be zero (see xl dmesg) >> - lspci dom0 >> - lspci HVM guest> HI, > Thanks for the information, very very helpful for debugging. I hope I > could start to look at this right after sending my next iommu patch > queue upstream...another question is: Did you see this issue on a single > pv/hvm guest system or you only saw it on a system with about 16 running > VMs?> Thanks, > WeiIf you need more info, i''m more than happy to run additional debug patches. Haven''t tested it with a single guest, will try right a way Thanks ! -- Sander>> >> >> >>>> 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ >>>> Address: 00000000fee0100c Data: 4128 >>>> Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+ >>>> >>>> Although it seems enabled, shouldn''t the IRQ number used be much higher than 10 for MSI interrupts ? >> >>> The IRQ number is fine. MSI vector is stored at Data: 4128 >> >>>> >>>> There is another difference in the bridge device that''s in front of the 0a:00.6 device that faults before the kernel is even booted. >>>> >>>> 00:03.0 PCI bridge [0604]: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port C) [1002:5a17] (prog-if 00 [Normal decode]) >>>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ >>>> 4.1: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- >>>> 4.2: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort+<MAbort->SERR-<PERR- INTx- >>>> Latency: 0, Cache Line Size: 64 bytes >>>> Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0 >>>> I/O behind bridge: 0000f000-00000fff >>>> Memory behind bridge: f9f00000-f9ffffff >>>> Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff >>>> 4.1: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort-<SERR-<PERR- >>>> 4.2: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort+<TAbort-<MAbort-<SERR-<PERR- >>>> BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort->Reset- FastB2B- >>>> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- >>>> Capabilities: [50] Power Management version 3 >>>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) >>>> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- >>>> Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00 >>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s<64ns, L1<1us >>>> ExtTag+ RBE+ FLReset- >>>> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- >>>> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ >>>> MaxPayload 128 bytes, MaxReadReq 128 bytes >>>> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- >>>> LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0<1us, L1<8us >>>> ClockPM- Surprise- LLActRep+ BwNot+ >>>> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- >>>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >>>> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- >>>> SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- >>>> Slot #3, PowerLimit 10.000W; Interlock- NoCompl+ >>>> SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- >>>> Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- >>>> SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- >>>> Changed: MRL- PresDet+ LinkState+ >> >>> The probably because of the IO_PAGE_FAULT. >> >>> Thanks, >>> Wei >> >>>> serveerstertje:~# lspci -t >>>> -[0000:00]-+-00.0 >>>> +-00.2 >>>> +-02.0-[0b]----00.0 >>>> +-03.0-[0a]--+-00.0 >>>> | +-00.1 >>>> | +-00.2 >>>> | +-00.3 >>>> | +-00.4 >>>> | +-00.5 >>>> | +-00.6 >>>> | \-00.7 >>>> +-05.0-[09]----00.0 >>>> +-06.0-[08]----00.0 >>>> +-0a.0-[07]----00.0 >>>> +-0b.0-[06]--+-00.0 >>>> | \-00.1 >>>> +-0c.0-[05]----00.0 >>>> +-0d.0-[04]--+-00.0 >>>> | +-00.1 >>>> | +-00.2 >>>> | +-00.3 >>>> | +-00.4 >>>> | +-00.5 >>>> | +-00.6 >>>> | \-00.7 >>>> +-11.0 >>>> +-12.0 >>>> +-12.2 >>>> +-13.0 >>>> +-13.2 >>>> +-14.0 >>>> +-14.3 >>>> +-14.4-[03]----06.0 >>>> +-14.5 >>>> +-15.0-[02]-- >>>> +-16.0 >>>> +-16.2 >>>> +-18.0 >>>> +-18.1 >>>> +-18.2 >>>> +-18.3 >>>> \-18.4 >>>> >>>> >>>> >>>> >>>> >>>>> Thanks, >>>>> Wei >>>> >>>>>> I will try to make a complete package, and try with one pv domain only where the devices are being passed through just to simplify the setup. >>>>>> >>>>>> >>>>>>> * I would also like to know the symptoms of device 0x0700 when IO_PF >>>>>>> happened. Did it stop working? >>>>>> >>>>>> Yes it stops working, the video capture just freezes, but the driver doesn''t bail out. >>>>>> For the USB controller (0x0a06) it starts to give errors for usbdev_open in the guest. >>>>>> >>>>>>> (BTW: I copied a few options from your boot cmd line and it worked with >>>>>>> my RD890 system >>>>>> >>>>>>> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps >>>>>>> cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug >>>>>>> apic=debug iommu=on,verbose,debug,no-sharept >>>>>> >>>>>>> * so, what OEM board you have?) >>>>>> >>>>>> MSI 890FXA-GD70 >>>>>> >>>>>>> Also from your log, these lines looks very strange: >>>>>> >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xd5, mfn=0xa4a11 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xd7, mfn=0xa4a0f >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xd9, mfn=0xa4a0d >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xdb, mfn=0xa4a0b >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xdd, mfn=0xa4a09 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xdf, mfn=0xa4a07 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe1, mfn=0xa4a05 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe3, mfn=0xa4a03 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe5, mfn=0xa4a01 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe7, mfn=0xa463f >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe9, mfn=0xa463d >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xeb, mfn=0xa463b >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xed, mfn=0xa4639 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xef, mfn=0xa4637 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id >>>>>>> = 0x0a06, fault address = 0xc2c2c2c0 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f8300 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f8340 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f8380 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f83c0 >>>>>> >>>>>>> * they are just followed by the IO PAGE fault. Do you know where are >>>>>>> they from? Your video card driver maybe? >>>>>> >>>>>> From a HVM domain with a old (3.0.3) kernel, but the faults also occur without this domain being started. >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Wei >>>>>> >>>>>> >>>>>>>> Complete xl dmesg and lspci -vvvknn attached. >>>>>>>> >>>>>>>> Thx >>>>>>>> >>>>>>>> -- >>>>>>>> Sander >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >> >>
Sander Eikelenboom
2012-Sep-24 21:08 UTC
Re: [PATCH] amd iommu: Dump flags of IO page faults
Monday, September 24, 2012, 2:24:16 PM, you wrote:> On 09/24/2012 10:38 AM, Sander Eikelenboom wrote: >> >> Friday, September 7, 2012, 10:54:40 AM, you wrote: >> >>> On 09/07/2012 09:32 AM, Sander Eikelenboom wrote: >>>> >>>> Thursday, September 6, 2012, 5:03:05 PM, you wrote: >>>> >>>>> On 09/06/2012 03:50 PM, Sander Eikelenboom wrote: >>>>>> >>>>>> Thursday, September 6, 2012, 3:32:51 PM, you wrote: >>>>>> >>>>>>> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote: >>>>>>>> >>>>>>>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote: >>>>>>>> >>>>>>>>> Hi Jan, >>>>>>>>> Attached patch dumps io page fault flags. The flags show the reason of >>>>>>>>> the fault and tell us if this is an unmapped interrupt fault or a DMA fault. >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Wei >>>>>>>> >>>>>>>>> signed-off-by: Wei Wang<wei.wang2@amd.com> >>>>>>>> >>>>>>>> >>>>>>>> I have applied the patch and the flags seem to differ between the faults: >>>>>>>> >>>>>>>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 >>>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020 >>>>>> >>>>>>> OK, so they are not interrupt requests. I guess further information from >>>>>>> your system would be helpful to debug this issue: >>>>>>> 1) xl info >>>>>>> 2) xl list >>>>>>> 3) lscpi -vvv (NOTE: not in dom0 but in your guest) >>>>>>> 4) cat /proc/iomem (in both dom0 and your hvm guest) >>>>>> >>>>>> dom14 is not a HVM guest,it''s a PV guest. >>>> >>>>> Ah, I see. PV guest is quite different than hvm, it does use p2m tables >>>>> as io page tables. So no-sharept option does not work in this case. PV >>>>> guests always use separated io page tables. There might be some >>>>> incorrect mappings on the page tables. I will check this on my side. >>>> >>>> I have reverted the machine to xen-4.1.4-pre (changeset 23353) and kept everything else the same. >>>> I haven''t seen any IO PAGE FAULTS after that. >>>> >>>> I did spot some differences in the output from lspci between xen 4.1 and 4.2, related to MSI enabled or not for the IOMMU device. >>>> Have attached the xl/xm dmesg and lspci from booting with both versions. >>>> >>>> lspci: >>>> >>>> 00:00.2 Generic system peripheral [0806]: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] >>>> Subsystem: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] >>>> Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- >>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- >>>> Latency: 0 >>>> Interrupt: pin A routed to IRQ 10 >>>> Capabilities: [40] Secure device<?> >>>> 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+ >> >>> Eh... That is interesting. So which dom0 are you using? There is a c/s >>> in 4.2 to prevent recent dom0 to disable iommu interrupt (changeset >>> 25492:61844569a432) Otherwise, iommu cannot send any events including IO >>> PAGE faults. You could try to revert dom0 to an old version like 2.6 >>> pv_ops to see if you really have no io page faults on 4.1 >> >> Ok i finally got the time to do some more testing, tested 4.2 around that changeset, and made a copy of the guest using HVM instead of PV. >> >> The results: >> - On xen-4.1.* and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine, both in a HVM as a PV guest, i don''t see IO page faults getting reported. >> - On xen-4.2 changeset< 25492 and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine, both in a HVM as a PV guest, i don''t see IO page faults getting reported. >> - On xen-4.2 changeset> 25492 and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine for a short while (around 5 to 10 minutes) in a PV guest, after that IO page faults get reported and the video freezes, i don''t see any errors in the guest though. >> - On xen-unstable tip and a 3.6-rc6 kernel (dom0 and domU): >> PV: the video device passed through works fine for a short while (around 5 to 10 minutes), after that IO page faults get reported and the video freezes, i don''t see any errors in the guest though. >> HVM: the video device passed through doesn''t work from the start: >> - The device is there according to lspci >> - The video application start fine, but delivers a green image, so the device is not working properly. I don''t see IO page faults though. >> >> Attached are (all with xen-unstable tip and the guest as HVM (domain 15): >> - xl dmesg >> - Patch which adds some more info, but all values reported seem to be zero (see xl dmesg) >> - lspci dom0 >> - lspci HVM guest> HI, > Thanks for the information, very very helpful for debugging. I hope I > could start to look at this right after sending my next iommu patch > queue upstream...another question is: Did you see this issue on a single > pv/hvm guest system or you only saw it on a system with about 16 running > VMs?The issue of the hvm not giving a video image also happens when it''s the first and only guest running after a cold boot.> Thanks, > Wei>> >> >> >>>> 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ >>>> Address: 00000000fee0100c Data: 4128 >>>> Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+ >>>> >>>> Although it seems enabled, shouldn''t the IRQ number used be much higher than 10 for MSI interrupts ? >> >>> The IRQ number is fine. MSI vector is stored at Data: 4128 >> >>>> >>>> There is another difference in the bridge device that''s in front of the 0a:00.6 device that faults before the kernel is even booted. >>>> >>>> 00:03.0 PCI bridge [0604]: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port C) [1002:5a17] (prog-if 00 [Normal decode]) >>>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ >>>> 4.1: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- >>>> 4.2: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort+<MAbort->SERR-<PERR- INTx- >>>> Latency: 0, Cache Line Size: 64 bytes >>>> Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0 >>>> I/O behind bridge: 0000f000-00000fff >>>> Memory behind bridge: f9f00000-f9ffffff >>>> Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff >>>> 4.1: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort-<SERR-<PERR- >>>> 4.2: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort+<TAbort-<MAbort-<SERR-<PERR- >>>> BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort->Reset- FastB2B- >>>> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- >>>> Capabilities: [50] Power Management version 3 >>>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) >>>> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- >>>> Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00 >>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s<64ns, L1<1us >>>> ExtTag+ RBE+ FLReset- >>>> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- >>>> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ >>>> MaxPayload 128 bytes, MaxReadReq 128 bytes >>>> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- >>>> LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0<1us, L1<8us >>>> ClockPM- Surprise- LLActRep+ BwNot+ >>>> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- >>>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >>>> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- >>>> SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- >>>> Slot #3, PowerLimit 10.000W; Interlock- NoCompl+ >>>> SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- >>>> Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- >>>> SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- >>>> Changed: MRL- PresDet+ LinkState+ >> >>> The probably because of the IO_PAGE_FAULT. >> >>> Thanks, >>> Wei >> >>>> serveerstertje:~# lspci -t >>>> -[0000:00]-+-00.0 >>>> +-00.2 >>>> +-02.0-[0b]----00.0 >>>> +-03.0-[0a]--+-00.0 >>>> | +-00.1 >>>> | +-00.2 >>>> | +-00.3 >>>> | +-00.4 >>>> | +-00.5 >>>> | +-00.6 >>>> | \-00.7 >>>> +-05.0-[09]----00.0 >>>> +-06.0-[08]----00.0 >>>> +-0a.0-[07]----00.0 >>>> +-0b.0-[06]--+-00.0 >>>> | \-00.1 >>>> +-0c.0-[05]----00.0 >>>> +-0d.0-[04]--+-00.0 >>>> | +-00.1 >>>> | +-00.2 >>>> | +-00.3 >>>> | +-00.4 >>>> | +-00.5 >>>> | +-00.6 >>>> | \-00.7 >>>> +-11.0 >>>> +-12.0 >>>> +-12.2 >>>> +-13.0 >>>> +-13.2 >>>> +-14.0 >>>> +-14.3 >>>> +-14.4-[03]----06.0 >>>> +-14.5 >>>> +-15.0-[02]-- >>>> +-16.0 >>>> +-16.2 >>>> +-18.0 >>>> +-18.1 >>>> +-18.2 >>>> +-18.3 >>>> \-18.4 >>>> >>>> >>>> >>>> >>>> >>>>> Thanks, >>>>> Wei >>>> >>>>>> I will try to make a complete package, and try with one pv domain only where the devices are being passed through just to simplify the setup. >>>>>> >>>>>> >>>>>>> * I would also like to know the symptoms of device 0x0700 when IO_PF >>>>>>> happened. Did it stop working? >>>>>> >>>>>> Yes it stops working, the video capture just freezes, but the driver doesn''t bail out. >>>>>> For the USB controller (0x0a06) it starts to give errors for usbdev_open in the guest. >>>>>> >>>>>>> (BTW: I copied a few options from your boot cmd line and it worked with >>>>>>> my RD890 system >>>>>> >>>>>>> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps >>>>>>> cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug >>>>>>> apic=debug iommu=on,verbose,debug,no-sharept >>>>>> >>>>>>> * so, what OEM board you have?) >>>>>> >>>>>> MSI 890FXA-GD70 >>>>>> >>>>>>> Also from your log, these lines looks very strange: >>>>>> >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xd5, mfn=0xa4a11 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xd7, mfn=0xa4a0f >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xd9, mfn=0xa4a0d >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xdb, mfn=0xa4a0b >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xdd, mfn=0xa4a09 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xdf, mfn=0xa4a07 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe1, mfn=0xa4a05 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe3, mfn=0xa4a03 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe5, mfn=0xa4a01 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe7, mfn=0xa463f >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe9, mfn=0xa463d >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xeb, mfn=0xa463b >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xed, mfn=0xa4639 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xef, mfn=0xa4637 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id >>>>>>> = 0x0a06, fault address = 0xc2c2c2c0 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f8300 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f8340 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f8380 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f83c0 >>>>>> >>>>>>> * they are just followed by the IO PAGE fault. Do you know where are >>>>>>> they from? Your video card driver maybe? >>>>>> >>>>>> From a HVM domain with a old (3.0.3) kernel, but the faults also occur without this domain being started. >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Wei >>>>>> >>>>>> >>>>>>>> Complete xl dmesg and lspci -vvvknn attached. >>>>>>>> >>>>>>>> Thx >>>>>>>> >>>>>>>> -- >>>>>>>> Sander >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >> >>
Sander Eikelenboom
2012-Oct-01 15:02 UTC
Re: [PATCH] amd iommu: Dump flags of IO page faults
Monday, September 24, 2012, 2:24:16 PM, you wrote:> On 09/24/2012 10:38 AM, Sander Eikelenboom wrote: >> >> Friday, September 7, 2012, 10:54:40 AM, you wrote: >> >>> On 09/07/2012 09:32 AM, Sander Eikelenboom wrote: >>>> >>>> Thursday, September 6, 2012, 5:03:05 PM, you wrote: >>>> >>>>> On 09/06/2012 03:50 PM, Sander Eikelenboom wrote: >>>>>> >>>>>> Thursday, September 6, 2012, 3:32:51 PM, you wrote: >>>>>> >>>>>>> On 09/06/2012 12:59 AM, Sander Eikelenboom wrote: >>>>>>>> >>>>>>>> Wednesday, September 5, 2012, 4:42:42 PM, you wrote: >>>>>>>> >>>>>>>>> Hi Jan, >>>>>>>>> Attached patch dumps io page fault flags. The flags show the reason of >>>>>>>>> the fault and tell us if this is an unmapped interrupt fault or a DMA fault. >>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Wei >>>>>>>> >>>>>>>>> signed-off-by: Wei Wang<wei.wang2@amd.com> >>>>>>>> >>>>>>>> >>>>>>>> I have applied the patch and the flags seem to differ between the faults: >>>>>>>> >>>>>>>> AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id = 0x0a06, fault address = 0xc2c2c2c0, flags = 0x000 >>>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d339e0, flags = 0x020 >>>>>>>> (XEN) [2012-09-05 20:54:16] AMD-Vi: IO_PAGE_FAULT: domain = 14, device id = 0x0700, fault address = 0xa8d33a40, flags = 0x020 >>>>>> >>>>>>> OK, so they are not interrupt requests. I guess further information from >>>>>>> your system would be helpful to debug this issue: >>>>>>> 1) xl info >>>>>>> 2) xl list >>>>>>> 3) lscpi -vvv (NOTE: not in dom0 but in your guest) >>>>>>> 4) cat /proc/iomem (in both dom0 and your hvm guest) >>>>>> >>>>>> dom14 is not a HVM guest,it''s a PV guest. >>>> >>>>> Ah, I see. PV guest is quite different than hvm, it does use p2m tables >>>>> as io page tables. So no-sharept option does not work in this case. PV >>>>> guests always use separated io page tables. There might be some >>>>> incorrect mappings on the page tables. I will check this on my side. >>>> >>>> I have reverted the machine to xen-4.1.4-pre (changeset 23353) and kept everything else the same. >>>> I haven''t seen any IO PAGE FAULTS after that. >>>> >>>> I did spot some differences in the output from lspci between xen 4.1 and 4.2, related to MSI enabled or not for the IOMMU device. >>>> Have attached the xl/xm dmesg and lspci from booting with both versions. >>>> >>>> lspci: >>>> >>>> 00:00.2 Generic system peripheral [0806]: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] >>>> Subsystem: ATI Technologies Inc RD990 I/O Memory Management Unit (IOMMU) [1002:5a23] >>>> Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- >>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- >>>> Latency: 0 >>>> Interrupt: pin A routed to IRQ 10 >>>> Capabilities: [40] Secure device<?> >>>> 4.1: Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit+ >> >>> Eh... That is interesting. So which dom0 are you using? There is a c/s >>> in 4.2 to prevent recent dom0 to disable iommu interrupt (changeset >>> 25492:61844569a432) Otherwise, iommu cannot send any events including IO >>> PAGE faults. You could try to revert dom0 to an old version like 2.6 >>> pv_ops to see if you really have no io page faults on 4.1 >> >> Ok i finally got the time to do some more testing, tested 4.2 around that changeset, and made a copy of the guest using HVM instead of PV. >> >> The results: >> - On xen-4.1.* and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine, both in a HVM as a PV guest, i don''t see IO page faults getting reported. >> - On xen-4.2 changeset< 25492 and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine, both in a HVM as a PV guest, i don''t see IO page faults getting reported. >> - On xen-4.2 changeset> 25492 and a 3.6-rc6 kernel (dom0 and domU): the video device passed through works fine for a short while (around 5 to 10 minutes) in a PV guest, after that IO page faults get reported and the video freezes, i don''t see any errors in the guest though. >> - On xen-unstable tip and a 3.6-rc6 kernel (dom0 and domU): >> PV: the video device passed through works fine for a short while (around 5 to 10 minutes), after that IO page faults get reported and the video freezes, i don''t see any errors in the guest though. >> HVM: the video device passed through doesn''t work from the start: >> - The device is there according to lspci >> - The video application start fine, but delivers a green image, so the device is not working properly. I don''t see IO page faults though. >> >> Attached are (all with xen-unstable tip and the guest as HVM (domain 15): >> - xl dmesg >> - Patch which adds some more info, but all values reported seem to be zero (see xl dmesg) >> - lspci dom0 >> - lspci HVM guest> HI, > Thanks for the information, very very helpful for debugging. I hope I > could start to look at this right after sending my next iommu patch > queue upstream...another question is: Did you see this issue on a single > pv/hvm guest system or you only saw it on a system with about 16 running > VMs?I have an update on this one... The green screen when using a HVM guest was due to the driver no being able to communicate with the device via I2C. This problem disappeared when updating to the latest xen-unstable and 3.6-rc7 kernel with additionally the linux-next branch from konrad''s tree pulled in. At the moment the HVM guest works: it shows video, it doesn''t give IO PAGE FAULT''s. Will try and see if it''s also miraculously fixed for PV as well.> Thanks, > Wei>> >> >> >>>> 4.2: Capabilities: [54] MSI: Enable+ Count=1/1 Maskable- 64bit+ >>>> Address: 00000000fee0100c Data: 4128 >>>> Capabilities: [64] HyperTransport: MSI Mapping Enable+ Fixed+ >>>> >>>> Although it seems enabled, shouldn''t the IRQ number used be much higher than 10 for MSI interrupts ? >> >>> The IRQ number is fine. MSI vector is stored at Data: 4128 >> >>>> >>>> There is another difference in the bridge device that''s in front of the 0a:00.6 device that faults before the kernel is even booted. >>>> >>>> 00:03.0 PCI bridge [0604]: ATI Technologies Inc RD890 PCI to PCI bridge (PCI express gpp port C) [1002:5a17] (prog-if 00 [Normal decode]) >>>> Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ >>>> 4.1: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort->SERR-<PERR- INTx- >>>> 4.2: Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort+<MAbort->SERR-<PERR- INTx- >>>> Latency: 0, Cache Line Size: 64 bytes >>>> Bus: primary=00, secondary=0a, subordinate=0a, sec-latency=0 >>>> I/O behind bridge: 0000f000-00000fff >>>> Memory behind bridge: f9f00000-f9ffffff >>>> Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff >>>> 4.1: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort-<TAbort-<MAbort-<SERR-<PERR- >>>> 4.2: Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast>TAbort+<TAbort-<MAbort-<SERR-<PERR- >>>> BridgeCtl: Parity+ SERR+ NoISA+ VGA- MAbort->Reset- FastB2B- >>>> PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- >>>> Capabilities: [50] Power Management version 3 >>>> Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) >>>> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- >>>> Capabilities: [58] Express (v2) Root Port (Slot+), MSI 00 >>>> DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s<64ns, L1<1us >>>> ExtTag+ RBE+ FLReset- >>>> DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- >>>> RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ >>>> MaxPayload 128 bytes, MaxReadReq 128 bytes >>>> DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- >>>> LnkCap: Port #1, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0<1us, L1<8us >>>> ClockPM- Surprise- LLActRep+ BwNot+ >>>> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk- >>>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- >>>> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt- >>>> SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- >>>> Slot #3, PowerLimit 10.000W; Interlock- NoCompl+ >>>> SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- >>>> Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock- >>>> SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- >>>> Changed: MRL- PresDet+ LinkState+ >> >>> The probably because of the IO_PAGE_FAULT. >> >>> Thanks, >>> Wei >> >>>> serveerstertje:~# lspci -t >>>> -[0000:00]-+-00.0 >>>> +-00.2 >>>> +-02.0-[0b]----00.0 >>>> +-03.0-[0a]--+-00.0 >>>> | +-00.1 >>>> | +-00.2 >>>> | +-00.3 >>>> | +-00.4 >>>> | +-00.5 >>>> | +-00.6 >>>> | \-00.7 >>>> +-05.0-[09]----00.0 >>>> +-06.0-[08]----00.0 >>>> +-0a.0-[07]----00.0 >>>> +-0b.0-[06]--+-00.0 >>>> | \-00.1 >>>> +-0c.0-[05]----00.0 >>>> +-0d.0-[04]--+-00.0 >>>> | +-00.1 >>>> | +-00.2 >>>> | +-00.3 >>>> | +-00.4 >>>> | +-00.5 >>>> | +-00.6 >>>> | \-00.7 >>>> +-11.0 >>>> +-12.0 >>>> +-12.2 >>>> +-13.0 >>>> +-13.2 >>>> +-14.0 >>>> +-14.3 >>>> +-14.4-[03]----06.0 >>>> +-14.5 >>>> +-15.0-[02]-- >>>> +-16.0 >>>> +-16.2 >>>> +-18.0 >>>> +-18.1 >>>> +-18.2 >>>> +-18.3 >>>> \-18.4 >>>> >>>> >>>> >>>> >>>> >>>>> Thanks, >>>>> Wei >>>> >>>>>> I will try to make a complete package, and try with one pv domain only where the devices are being passed through just to simplify the setup. >>>>>> >>>>>> >>>>>>> * I would also like to know the symptoms of device 0x0700 when IO_PF >>>>>>> happened. Did it stop working? >>>>>> >>>>>> Yes it stops working, the video capture just freezes, but the driver doesn''t bail out. >>>>>> For the USB controller (0x0a06) it starts to give errors for usbdev_open in the guest. >>>>>> >>>>>>> (BTW: I copied a few options from your boot cmd line and it worked with >>>>>>> my RD890 system >>>>>> >>>>>>> dom0_mem=1024M,max:1024M loglvl=all loglvl_guest=all console_timestamps >>>>>>> cpuidle cpufreq=xen noreboot debug lapic=debug apic_verbosity=debug >>>>>>> apic=debug iommu=on,verbose,debug,no-sharept >>>>>> >>>>>>> * so, what OEM board you have?) >>>>>> >>>>>> MSI 890FXA-GD70 >>>>>> >>>>>>> Also from your log, these lines looks very strange: >>>>>> >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xd5, mfn=0xa4a11 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xd7, mfn=0xa4a0f >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xd9, mfn=0xa4a0d >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xdb, mfn=0xa4a0b >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xdd, mfn=0xa4a09 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xdf, mfn=0xa4a07 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe1, mfn=0xa4a05 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe3, mfn=0xa4a03 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe5, mfn=0xa4a01 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe7, mfn=0xa463f >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xe9, mfn=0xa463d >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xeb, mfn=0xa463b >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xed, mfn=0xa4639 >>>>>>> (XEN) [2012-09-04 15:54:35] hvm.c:2435:d15 guest attempted write to >>>>>>> read-only memory page. gfn=0xef, mfn=0xa4637 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 0, device id >>>>>>> = 0x0a06, fault address = 0xc2c2c2c0 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f8300 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f8340 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f8380 >>>>>>> (XEN) [2012-09-04 16:13:56] AMD-Vi: IO_PAGE_FAULT: domain = 14, device >>>>>>> id = 0x0700, fault address = 0xa90f83c0 >>>>>> >>>>>>> * they are just followed by the IO PAGE fault. Do you know where are >>>>>>> they from? Your video card driver maybe? >>>>>> >>>>>> From a HVM domain with a old (3.0.3) kernel, but the faults also occur without this domain being started. >>>>>> >>>>>> >>>>>>> Thanks, >>>>>>> Wei >>>>>> >>>>>> >>>>>>>> Complete xl dmesg and lspci -vvvknn attached. >>>>>>>> >>>>>>>> Thx >>>>>>>> >>>>>>>> -- >>>>>>>> Sander >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >> >>
Possibly Parallel Threads
- RESEND [Xen-unstable][Qemu-xen] HVM Guest reading of Expansion ROM from passthroughed PCI device returns data from emulated VGA rom
- Assistance in tracking a kernel/nouveau error
- XEN 4.1.2+Centos 6.2+Kernel 3.X
- XEN 4.1.2+Centos 6.2+Kernel 3.X
- rombios unable to loaded MPT BIOS