Li, Haicheng
2009-Jan-22 03:42 UTC
[Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
All, We met several system failures on different hardware platforms, which are all caused by VT-d fault. err 1: disk is corrupted by VT-d fault on SATA. err 2: Dom0 kernel panics at booting, which is caused VT-d fault on UHCI. err 3, Dom0 complains disk errors while creating HVM guests. The culprit would be changeset 19054 "x86_64: Remove statically-partitioned Xen heap.". Detailed error logs can be found via BZ#, http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1409. -haicheng _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-22 07:40 UTC
Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
Thanks, I haven''t seen any problems outside of VT-d since c/s 19057, btw. -- Keir On 22/01/2009 03:42, "Li, Haicheng" <haicheng.li@intel.com> wrote:> All, > > We met several system failures on different hardware platforms, which are all > caused by VT-d fault. > err 1: disk is corrupted by VT-d fault on SATA. > err 2: Dom0 kernel panics at booting, which is caused VT-d fault on UHCI. > err 3, Dom0 complains disk errors while creating HVM guests. > > The culprit would be changeset 19054 "x86_64: Remove statically-partitioned > Xen heap.". > > Detailed error logs can be found via BZ#, > http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1409. > > > -haicheng > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Li, Xin
2009-Jan-22 08:58 UTC
RE: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
We are looking into the issue too. If you have any idea on how it''s caused, please tell us :-) Thanks! -Xin>-----Original Message----- >From: xen-devel-bounces@lists.xensource.com >[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >Sent: Thursday, January 22, 2009 3:40 PM >To: Li, Haicheng; ''xen-devel@lists.xensource.com'' >Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 >kernel panic. > >Thanks, > >I haven''t seen any problems outside of VT-d since c/s 19057, btw. > > -- Keir > >On 22/01/2009 03:42, "Li, Haicheng" <haicheng.li@intel.com> wrote: > >> All, >> >> We met several system failures on different hardware platforms, which are all >> caused by VT-d fault. >> err 1: disk is corrupted by VT-d fault on SATA. >> err 2: Dom0 kernel panics at booting, which is caused VT-d fault on UHCI. >> err 3, Dom0 complains disk errors while creating HVM guests. >> >> The culprit would be changeset 19054 "x86_64: Remove statically-partitioned >> Xen heap.". >> >> Detailed error logs can be found via BZ#, >> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1409. >> >> >> -haicheng >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-22 09:23 UTC
Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
Mmm well not really. :-) Is there any assumption in the VT-d setup about preventing access to the Xen heap, and could that be broken? Perhaps the VT-d pagetables are broken causing bad DMAs leading to data corruption and bad command packets? -- Keir On 22/01/2009 08:58, "Li, Xin" <xin.li@intel.com> wrote:> We are looking into the issue too. If you have any idea on how it''s caused, > please tell us :-) > Thanks! > -Xin > >> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >> Sent: Thursday, January 22, 2009 3:40 PM >> To: Li, Haicheng; ''xen-devel@lists.xensource.com'' >> Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or >> Dom0 >> kernel panic. >> >> Thanks, >> >> I haven''t seen any problems outside of VT-d since c/s 19057, btw. >> >> -- Keir >> >> On 22/01/2009 03:42, "Li, Haicheng" <haicheng.li@intel.com> wrote: >> >>> All, >>> >>> We met several system failures on different hardware platforms, which are >>> all >>> caused by VT-d fault. >>> err 1: disk is corrupted by VT-d fault on SATA. >>> err 2: Dom0 kernel panics at booting, which is caused VT-d fault on UHCI. >>> err 3, Dom0 complains disk errors while creating HVM guests. >>> >>> The culprit would be changeset 19054 "x86_64: Remove statically-partitioned >>> Xen heap.". >>> >>> Detailed error logs can be found via BZ#, >>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1409. >>> >>> >>> -haicheng >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Akio Takebe
2009-Jan-22 09:40 UTC
Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption orDom0 kernel panic.
Hi, Keir Is compute_dom0_nr_pages() OK? The number of available pages seems to be increase. What do you think? Best Regards, Akio Takebe>Mmm well not really. :-) > >Is there any assumption in the VT-d setup about preventing access to the Xen >heap, and could that be broken? > >Perhaps the VT-d pagetables are broken causing bad DMAs leading to data >corruption and bad command packets? > > -- Keir > >On 22/01/2009 08:58, "Li, Xin" <xin.li@intel.com> wrote: > >> We are looking into the issue too. If you have any idea on how it''s caused, >> please tell us :-) >> Thanks! >> -Xin >> >>> -----Original Message----- >>> From: xen-devel-bounces@lists.xensource.com >>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >>> Sent: Thursday, January 22, 2009 3:40 PM >>> To: Li, Haicheng; ''xen-devel@lists.xensource.com'' >>> Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption >>> or >>> Dom0 >>> kernel panic. >>> >>> Thanks, >>> >>> I haven''t seen any problems outside of VT-d since c/s 19057, btw. >>> >>> -- Keir >>> >>> On 22/01/2009 03:42, "Li, Haicheng" <haicheng.li@intel.com> wrote: >>> >>>> All, >>>> >>>> We met several system failures on different hardware platforms, which are >>>> all >>>> caused by VT-d fault. >>>> err 1: disk is corrupted by VT-d fault on SATA. >>>> err 2: Dom0 kernel panics at booting, which is caused VT-d fault on UHCI. >>>> err 3, Dom0 complains disk errors while creating HVM guests. >>>> >>>> The culprit would be changeset 19054 "x86_64: Remove statically- >>>> partitioned >>>> Xen heap.". >>>> >>>> Detailed error logs can be found via BZ#, >>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1409. >>>> >>>> >>>> -haicheng >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel > > > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-22 09:47 UTC
Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption orDom0 kernel panic.
On 22/01/2009 09:40, "Akio Takebe" <takebe_akio@jp.fujitsu.com> wrote:> Is compute_dom0_nr_pages() OK? > The number of available pages seems to be increase. > What do you think?Now that the heaps are unified, the old Xen heap pages are visible to dom0 and available to be allocated. Is this a problem? Probably not since we already by default prevent dom0 from allocating all memory up front. There are similar concerns regarding the auto-ballooner by the way, which are more likely to need addressing. If you run with dom0_mem= and no auto-ballooner then there is no worry at all. Btw these concerns should have nothing to do with the reported VT-d faults. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2009-Jan-23 01:01 UTC
RE: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
Looks like the problem is caused by xen_in_range() call in vtd/iommu.c/intel_iommu_domain_init(). Definition of xen_in_range() was changed as part of the heap patch. I''m looking into change intel_iommu_domain_init() to just map pages in dom0->page_list. However this looks to be more complicated as d->page_list is not initialized at this stage of the boot yet. Allen -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser Sent: Thursday, January 22, 2009 1:23 AM To: Li, Xin; Li, Haicheng; ''xen-devel@lists.xensource.com'' Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic. Mmm well not really. :-) Is there any assumption in the VT-d setup about preventing access to the Xen heap, and could that be broken? Perhaps the VT-d pagetables are broken causing bad DMAs leading to data corruption and bad command packets? -- Keir On 22/01/2009 08:58, "Li, Xin" <xin.li@intel.com> wrote:> We are looking into the issue too. If you have any idea on how it''s caused, > please tell us :-) > Thanks! > -Xin > >> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >> Sent: Thursday, January 22, 2009 3:40 PM >> To: Li, Haicheng; ''xen-devel@lists.xensource.com'' >> Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or >> Dom0 >> kernel panic. >> >> Thanks, >> >> I haven''t seen any problems outside of VT-d since c/s 19057, btw. >> >> -- Keir >> >> On 22/01/2009 03:42, "Li, Haicheng" <haicheng.li@intel.com> wrote: >> >>> All, >>> >>> We met several system failures on different hardware platforms, which are >>> all >>> caused by VT-d fault. >>> err 1: disk is corrupted by VT-d fault on SATA. >>> err 2: Dom0 kernel panics at booting, which is caused VT-d fault on UHCI. >>> err 3, Dom0 complains disk errors while creating HVM guests. >>> >>> The culprit would be changeset 19054 "x86_64: Remove statically-partitioned >>> Xen heap.". >>> >>> Detailed error logs can be found via BZ#, >>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1409. >>> >>> >>> -haicheng >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-23 08:33 UTC
Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
Are you sure that is the problem? The xen_in_range() change should make the dom0 VT-d table more permissive, and hence if anything less likely to experience VT-d faults. Also it wouldn''t seem to explain problems for HVM guest passthrough. -- Keir On 23/01/2009 01:01, "Kay, Allen M" <allen.m.kay@intel.com> wrote:> Looks like the problem is caused by xen_in_range() call in > vtd/iommu.c/intel_iommu_domain_init(). Definition of xen_in_range() was > changed as part of the heap patch. > > I''m looking into change intel_iommu_domain_init() to just map pages in > dom0->page_list. However this looks to be more complicated as d->page_list is > not initialized at this stage of the boot yet. > > Allen > > -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser > Sent: Thursday, January 22, 2009 1:23 AM > To: Li, Xin; Li, Haicheng; ''xen-devel@lists.xensource.com'' > Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or > Dom0 kernel panic. > > Mmm well not really. :-) > > Is there any assumption in the VT-d setup about preventing access to the Xen > heap, and could that be broken? > > Perhaps the VT-d pagetables are broken causing bad DMAs leading to data > corruption and bad command packets? > > -- Keir > > On 22/01/2009 08:58, "Li, Xin" <xin.li@intel.com> wrote: > >> We are looking into the issue too. If you have any idea on how it''s caused, >> please tell us :-) >> Thanks! >> -Xin >> >>> -----Original Message----- >>> From: xen-devel-bounces@lists.xensource.com >>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >>> Sent: Thursday, January 22, 2009 3:40 PM >>> To: Li, Haicheng; ''xen-devel@lists.xensource.com'' >>> Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or >>> Dom0 >>> kernel panic. >>> >>> Thanks, >>> >>> I haven''t seen any problems outside of VT-d since c/s 19057, btw. >>> >>> -- Keir >>> >>> On 22/01/2009 03:42, "Li, Haicheng" <haicheng.li@intel.com> wrote: >>> >>>> All, >>>> >>>> We met several system failures on different hardware platforms, which are >>>> all >>>> caused by VT-d fault. >>>> err 1: disk is corrupted by VT-d fault on SATA. >>>> err 2: Dom0 kernel panics at booting, which is caused VT-d fault on UHCI. >>>> err 3, Dom0 complains disk errors while creating HVM guests. >>>> >>>> The culprit would be changeset 19054 "x86_64: Remove statically-partitioned >>>> Xen heap.". >>>> >>>> Detailed error logs can be found via BZ#, >>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1409. >>>> >>>> >>>> -haicheng >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2009-Jan-23 17:30 UTC
RE: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
I have not figured out why this is the problem yet but I know comment it out makes the problem go away. Leaving tboot_in_range() in does not cause this problem. Allen -----Original Message----- From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] Sent: Friday, January 23, 2009 12:34 AM To: Kay, Allen M; Li, Xin; Li, Haicheng; ''xen-devel@lists.xensource.com'' Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic. Are you sure that is the problem? The xen_in_range() change should make the dom0 VT-d table more permissive, and hence if anything less likely to experience VT-d faults. Also it wouldn''t seem to explain problems for HVM guest passthrough. -- Keir On 23/01/2009 01:01, "Kay, Allen M" <allen.m.kay@intel.com> wrote:> Looks like the problem is caused by xen_in_range() call in > vtd/iommu.c/intel_iommu_domain_init(). Definition of xen_in_range() was > changed as part of the heap patch. > > I''m looking into change intel_iommu_domain_init() to just map pages in > dom0->page_list. However this looks to be more complicated as d->page_list is > not initialized at this stage of the boot yet. > > Allen > > -----Original Message----- > From: xen-devel-bounces@lists.xensource.com > [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser > Sent: Thursday, January 22, 2009 1:23 AM > To: Li, Xin; Li, Haicheng; ''xen-devel@lists.xensource.com'' > Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or > Dom0 kernel panic. > > Mmm well not really. :-) > > Is there any assumption in the VT-d setup about preventing access to the Xen > heap, and could that be broken? > > Perhaps the VT-d pagetables are broken causing bad DMAs leading to data > corruption and bad command packets? > > -- Keir > > On 22/01/2009 08:58, "Li, Xin" <xin.li@intel.com> wrote: > >> We are looking into the issue too. If you have any idea on how it''s caused, >> please tell us :-) >> Thanks! >> -Xin >> >>> -----Original Message----- >>> From: xen-devel-bounces@lists.xensource.com >>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >>> Sent: Thursday, January 22, 2009 3:40 PM >>> To: Li, Haicheng; ''xen-devel@lists.xensource.com'' >>> Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or >>> Dom0 >>> kernel panic. >>> >>> Thanks, >>> >>> I haven''t seen any problems outside of VT-d since c/s 19057, btw. >>> >>> -- Keir >>> >>> On 22/01/2009 03:42, "Li, Haicheng" <haicheng.li@intel.com> wrote: >>> >>>> All, >>>> >>>> We met several system failures on different hardware platforms, which are >>>> all >>>> caused by VT-d fault. >>>> err 1: disk is corrupted by VT-d fault on SATA. >>>> err 2: Dom0 kernel panics at booting, which is caused VT-d fault on UHCI. >>>> err 3, Dom0 complains disk errors while creating HVM guests. >>>> >>>> The culprit would be changeset 19054 "x86_64: Remove statically-partitioned >>>> Xen heap.". >>>> >>>> Detailed error logs can be found via BZ#, >>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1409. >>>> >>>> >>>> -haicheng >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-23 18:41 UTC
Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
Ah, I know what it is! We actually free up bits of the Xen image at the end of Xen bootstrap, and these can now be allocated to a domain (e.g., dom0) and DMAed to. But these will be contained within the bounds of __pa(&_start) and __pa(&_end) and hence will not have been mapped in dom0''d vtd tables. Sadly the fact is that Xen relies on validity of memory from the domain heap as well as Xen heap anyway, so the avoidance of mapping Xen-critical memory in dom0 vtd tables is inadequate anyway, even on x86_32 and ia64. Also it''s going to be hard to do better while keeping efficiency since if you only map dom0''s pages in its vtd tables then PV backend drivers will not work (which rely on DMAing to/from other domain''s pages via grant references). You''d have to dynamically map/unmap as grants get mapped/unmapped, and you may not want the performance hit of that. I''d personally vote for getting rid of xen_in_range(). Alternatively we could have it merely check for is_kernel_text(), but really I think since it is not in any way full protection from dom0 I wonder if it is worth the bother at all. What do you think? -- Keir On 23/01/2009 17:30, "Kay, Allen M" <allen.m.kay@intel.com> wrote:> I have not figured out why this is the problem yet but I know comment it out > makes the problem go away. Leaving tboot_in_range() in does not cause this > problem. > > Allen > > -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Friday, January 23, 2009 12:34 AM > To: Kay, Allen M; Li, Xin; Li, Haicheng; ''xen-devel@lists.xensource.com'' > Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or > Dom0 kernel panic. > > Are you sure that is the problem? The xen_in_range() change should make the > dom0 VT-d table more permissive, and hence if anything less likely to > experience VT-d faults. Also it wouldn''t seem to explain problems for HVM > guest passthrough. > > -- Keir > > On 23/01/2009 01:01, "Kay, Allen M" <allen.m.kay@intel.com> wrote: > >> Looks like the problem is caused by xen_in_range() call in >> vtd/iommu.c/intel_iommu_domain_init(). Definition of xen_in_range() was >> changed as part of the heap patch. >> >> I''m looking into change intel_iommu_domain_init() to just map pages in >> dom0->page_list. However this looks to be more complicated as d->page_list >> is >> not initialized at this stage of the boot yet. >> >> Allen >> >> -----Original Message----- >> From: xen-devel-bounces@lists.xensource.com >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >> Sent: Thursday, January 22, 2009 1:23 AM >> To: Li, Xin; Li, Haicheng; ''xen-devel@lists.xensource.com'' >> Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or >> Dom0 kernel panic. >> >> Mmm well not really. :-) >> >> Is there any assumption in the VT-d setup about preventing access to the Xen >> heap, and could that be broken? >> >> Perhaps the VT-d pagetables are broken causing bad DMAs leading to data >> corruption and bad command packets? >> >> -- Keir >> >> On 22/01/2009 08:58, "Li, Xin" <xin.li@intel.com> wrote: >> >>> We are looking into the issue too. If you have any idea on how it''s caused, >>> please tell us :-) >>> Thanks! >>> -Xin >>> >>>> -----Original Message----- >>>> From: xen-devel-bounces@lists.xensource.com >>>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser >>>> Sent: Thursday, January 22, 2009 3:40 PM >>>> To: Li, Haicheng; ''xen-devel@lists.xensource.com'' >>>> Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or >>>> Dom0 >>>> kernel panic. >>>> >>>> Thanks, >>>> >>>> I haven''t seen any problems outside of VT-d since c/s 19057, btw. >>>> >>>> -- Keir >>>> >>>> On 22/01/2009 03:42, "Li, Haicheng" <haicheng.li@intel.com> wrote: >>>> >>>>> All, >>>>> >>>>> We met several system failures on different hardware platforms, which are >>>>> all >>>>> caused by VT-d fault. >>>>> err 1: disk is corrupted by VT-d fault on SATA. >>>>> err 2: Dom0 kernel panics at booting, which is caused VT-d fault on UHCI. >>>>> err 3, Dom0 complains disk errors while creating HVM guests. >>>>> >>>>> The culprit would be changeset 19054 "x86_64: Remove >>>>> statically-partitioned >>>>> Xen heap.". >>>>> >>>>> Detailed error logs can be found via BZ#, >>>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1409. >>>>> >>>>> >>>>> -haicheng >>>>> _______________________________________________ >>>>> Xen-devel mailing list >>>>> Xen-devel@lists.xensource.com >>>>> http://lists.xensource.com/xen-devel >>>> >>>> >>>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xensource.com >>>> http://lists.xensource.com/xen-devel >> >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-23 18:44 UTC
Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
On 23/01/2009 18:41, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> Also it''s going to be hard to do better while keeping efficiency since if you > only map dom0''s pages in its vtd tables then PV backend drivers will not work > (which rely on DMAing to/from other domain''s pages via grant references). > You''d have to dynamically map/unmap as grants get mapped/unmapped, and you may > not want the performance hit of that. > > I''d personally vote for getting rid of xen_in_range(). Alternatively we could > have it merely check for is_kernel_text(), but really I think since it is not > in any way full protection from dom0 I wonder if it is worth the bother at > all. > > What do you think?I should add that you could still implement the more sophisticated and slower full protection, where dom0 only has DMA access to pages it currently has access to via the host CPUs, as a boot option. For those who really don''t want to trust dom0 as far as possible. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Kay, Allen M
2009-Jan-23 23:40 UTC
RE: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
I talked to Joe Cihula about this. He is suggesting map only the RAM memory in E820 table. This is more secure than map everything below max_page. We can do this for x86_64 and x86_32. For IA-64, we still map everything below max_page as there is no tboot issue. What do you think of is approach? Allen -----Original Message----- From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser Sent: Friday, January 23, 2009 10:44 AM To: Kay, Allen M; Li, Xin; Li, Haicheng; ''xen-devel@lists.xensource.com'' Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic. On 23/01/2009 18:41, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> Also it''s going to be hard to do better while keeping efficiency since if you > only map dom0''s pages in its vtd tables then PV backend drivers will not work > (which rely on DMAing to/from other domain''s pages via grant references). > You''d have to dynamically map/unmap as grants get mapped/unmapped, and you may > not want the performance hit of that. > > I''d personally vote for getting rid of xen_in_range(). Alternatively we could > have it merely check for is_kernel_text(), but really I think since it is not > in any way full protection from dom0 I wonder if it is worth the bother at > all. > > What do you think?I should add that you could still implement the more sophisticated and slower full protection, where dom0 only has DMA access to pages it currently has access to via the host CPUs, as a boot option. For those who really don''t want to trust dom0 as far as possible. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cihula, Joseph
2009-Jan-24 00:34 UTC
RE: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
> From: Kay, Allen M > Sent: Friday, January 23, 2009 3:40 PM > > I talked to Joe Cihula about this. He is suggesting map only the RAM memory in E820 table. > This is more secure than map everything below max_page. We can do this for x86_64 and x86_32. > For IA-64, we still map everything below max_page as there is no tboot issue. > > What do you think of is approach? > > AllenBut excluding the Xen text sections using is_kernel_text(). Joe> > -----Original Message----- > From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On > Behalf Of Keir Fraser > Sent: Friday, January 23, 2009 10:44 AM > To: Kay, Allen M; Li, Xin; Li, Haicheng; ''xen-devel@lists.xensource.com'' > Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic. > > On 23/01/2009 18:41, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > > > Also it''s going to be hard to do better while keeping efficiency since if you > > only map dom0''s pages in its vtd tables then PV backend drivers will not work > > (which rely on DMAing to/from other domain''s pages via grant references). > > You''d have to dynamically map/unmap as grants get mapped/unmapped, and you may > > not want the performance hit of that. > > > > I''d personally vote for getting rid of xen_in_range(). Alternatively we could > > have it merely check for is_kernel_text(), but really I think since it is not > > in any way full protection from dom0 I wonder if it is worth the bother at > > all. > > > > What do you think? > > I should add that you could still implement the more sophisticated and > slower full protection, where dom0 only has DMA access to pages it currently > has access to via the host CPUs, as a boot option. For those who really > don''t want to trust dom0 as far as possible. > > -- Keir > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cihula, Joseph
2009-Jan-24 02:19 UTC
RE: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
> From: xen-devel-bounces@lists.xensource.com [mailto:xen-devel-bounces@lists.xensource.com] On > Behalf Of Keir Fraser > Sent: Friday, January 23, 2009 10:42 AM > > Ah, I know what it is! We actually free up bits of the Xen image at the end > of Xen bootstrap, and these can now be allocated to a domain (e.g., dom0) > and DMAed to. But these will be contained within the bounds of __pa(&_start) > and __pa(&_end) and hence will not have been mapped in dom0''d vtd tables. > > Sadly the fact is that Xen relies on validity of memory from the domain heap > as well as Xen heap anyway, so the avoidance of mapping Xen-critical memory > in dom0 vtd tables is inadequate anyway, even on x86_32 and ia64. > > Also it''s going to be hard to do better while keeping efficiency since if > you only map dom0''s pages in its vtd tables then PV backend drivers will not > work (which rely on DMAing to/from other domain''s pages via grant > references). You''d have to dynamically map/unmap as grants get > mapped/unmapped, and you may not want the performance hit of that. > > I''d personally vote for getting rid of xen_in_range(). Alternatively we > could have it merely check for is_kernel_text(), but really I think since it > is not in any way full protection from dom0 I wonder if it is worth the > bother at all. > > What do you think? > > -- KeirSince this is somewhat similar to the issue I''m facing with the TXT patch, it does seem useful to have a good way of knowing where all of the hypervisor memory is. I looked at is_kernel_text() and that only compares against _stext/_etext, which after looking at the xen.lds file, is really just some of the code of the hypervisor. Is there any reason not to use [_stext, __init_begin) + [__per_cpu_start, __per_cpu_end] + [__bss_start, _end] + [bootsym_phys(trampoline_start), bootsym_phys(trampoline_end)] as a first approximation of hypervisor memory (I''m assuming that the code within [__init_begin, __init_end] is what you reclaim)? While this still doesn''t get the xen heap or domain heap, it at least gets us a little farther. For the MAC aspect of the TXT patch, we need to know all of the code + data that could be used during resume and before the xen code that MACs everything else. This includes the stack, page tables, etc. We''ve also added a fn that checks the ACPI Sx addresses against xen memory (hypervisor + domain) to ensure that tboot can''t be tricked into overwriting xen as part of S3. This should be a more comprehensive check than for MAC, since there is no way of detecting if we missed some range. Joe> > On 23/01/2009 17:30, "Kay, Allen M" <allen.m.kay@intel.com> wrote: > > > I have not figured out why this is the problem yet but I know comment it out > > makes the problem go away. Leaving tboot_in_range() in does not cause this > > problem. > > > > Allen > > > > -----Original Message----- > > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > > Sent: Friday, January 23, 2009 12:34 AM > > To: Kay, Allen M; Li, Xin; Li, Haicheng; ''xen-devel@lists.xensource.com'' > > Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or > > Dom0 kernel panic. > > > > Are you sure that is the problem? The xen_in_range() change should make the > > dom0 VT-d table more permissive, and hence if anything less likely to > > experience VT-d faults. Also it wouldn''t seem to explain problems for HVM > > guest passthrough. > > > > -- Keir > > > > On 23/01/2009 01:01, "Kay, Allen M" <allen.m.kay@intel.com> wrote: > > > >> Looks like the problem is caused by xen_in_range() call in > >> vtd/iommu.c/intel_iommu_domain_init(). Definition of xen_in_range() was > >> changed as part of the heap patch. > >> > >> I''m looking into change intel_iommu_domain_init() to just map pages in > >> dom0->page_list. However this looks to be more complicated as d->page_list > >> is > >> not initialized at this stage of the boot yet. > >> > >> Allen > >> > >> -----Original Message----- > >> From: xen-devel-bounces@lists.xensource.com > >> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser > >> Sent: Thursday, January 22, 2009 1:23 AM > >> To: Li, Xin; Li, Haicheng; ''xen-devel@lists.xensource.com'' > >> Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or > >> Dom0 kernel panic. > >> > >> Mmm well not really. :-) > >> > >> Is there any assumption in the VT-d setup about preventing access to the Xen > >> heap, and could that be broken? > >> > >> Perhaps the VT-d pagetables are broken causing bad DMAs leading to data > >> corruption and bad command packets? > >> > >> -- Keir > >> > >> On 22/01/2009 08:58, "Li, Xin" <xin.li@intel.com> wrote: > >> > >>> We are looking into the issue too. If you have any idea on how it''s caused, > >>> please tell us :-) > >>> Thanks! > >>> -Xin > >>> > >>>> -----Original Message----- > >>>> From: xen-devel-bounces@lists.xensource.com > >>>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Keir Fraser > >>>> Sent: Thursday, January 22, 2009 3:40 PM > >>>> To: Li, Haicheng; ''xen-devel@lists.xensource.com'' > >>>> Subject: Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or > >>>> Dom0 > >>>> kernel panic. > >>>> > >>>> Thanks, > >>>> > >>>> I haven''t seen any problems outside of VT-d since c/s 19057, btw. > >>>> > >>>> -- Keir > >>>> > >>>> On 22/01/2009 03:42, "Li, Haicheng" <haicheng.li@intel.com> wrote: > >>>> > >>>>> All, > >>>>> > >>>>> We met several system failures on different hardware platforms, which are > >>>>> all > >>>>> caused by VT-d fault. > >>>>> err 1: disk is corrupted by VT-d fault on SATA. > >>>>> err 2: Dom0 kernel panics at booting, which is caused VT-d fault on UHCI. > >>>>> err 3, Dom0 complains disk errors while creating HVM guests. > >>>>> > >>>>> The culprit would be changeset 19054 "x86_64: Remove > >>>>> statically-partitioned > >>>>> Xen heap.". > >>>>> > >>>>> Detailed error logs can be found via BZ#, > >>>>> http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1409. > >>>>> > >>>>> > >>>>> -haicheng > >>>>> _______________________________________________ > >>>>> Xen-devel mailing list > >>>>> Xen-devel@lists.xensource.com > >>>>> http://lists.xensource.com/xen-devel > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Xen-devel mailing list > >>>> Xen-devel@lists.xensource.com > >>>> http://lists.xensource.com/xen-devel > >> > >> > >> > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xensource.com > >> http://lists.xensource.com/xen-devel > > > > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-24 09:15 UTC
Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
On 23/01/2009 23:40, "Kay, Allen M" <allen.m.kay@intel.com> wrote:> I talked to Joe Cihula about this. He is suggesting map only the RAM memory > in E820 table. This is more secure than map everything below max_page. We > can do this for x86_64 and x86_32. For IA-64, we still map everything below > max_page as there is no tboot issue. > > What do you think of is approach?That''s an orthogonal issue to avoiding Xen''s RAM, but it at least ought to be easy to do. As long as it doesn''t skip any private BIOS buffers for any devices which are still fully or partially under BIOS control (e.g., via SMM). But any such buffers above max_page would already be skipped. I can check in a patch for this as well as a patch to fix xen_in_range(). I''ll do both. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-24 09:26 UTC
Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
On 24/01/2009 09:15, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> I talked to Joe Cihula about this. He is suggesting map only the RAM memory >> in E820 table. This is more secure than map everything below max_page. We >> can do this for x86_64 and x86_32. For IA-64, we still map everything below >> max_page as there is no tboot issue. >> >> What do you think of is approach? > > That''s an orthogonal issue to avoiding Xen''s RAM, but it at least ought to > be easy to do. As long as it doesn''t skip any private BIOS buffers for any > devices which are still fully or partially under BIOS control (e.g., via > SMM). But any such buffers above max_page would already be skipped. > > I can check in a patch for this as well as a patch to fix xen_in_range(). > I''ll do both.Changesets 19081 and 19082. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cihula, Joseph
2009-Jan-24 19:07 UTC
RE: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Saturday, January 24, 2009 1:27 AM > > On 24/01/2009 09:15, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > > >> I talked to Joe Cihula about this. He is suggesting map only the RAM memory > >> in E820 table. This is more secure than map everything below max_page. We > >> can do this for x86_64 and x86_32. For IA-64, we still map everything below > >> max_page as there is no tboot issue. > >> > >> What do you think of is approach? > > > > That''s an orthogonal issue to avoiding Xen''s RAM, but it at least ought to > > be easy to do. As long as it doesn''t skip any private BIOS buffers for any > > devices which are still fully or partially under BIOS control (e.g., via > > SMM). But any such buffers above max_page would already be skipped. > > > > I can check in a patch for this as well as a patch to fix xen_in_range(). > > I''ll do both. > > Changesets 19081 and 19082.Since the tboot memory is marked as reserved or unusable, the tboot_in_range() call is no longer needed. Do you want me to add that to my patch set? You''ve used a call to memory_is_conventional_ram() to check whether the page is in non-reserved RAM, but in looking at the function (below), I don''t see how the ''(e820.map[i].size > p)'' test is valid--shouldn''t it be ''((e820.map[i].addr + e820.map[i].size) > p)''? int memory_is_conventional_ram(paddr_t p) { int i; for ( i = 0; i < e820.nr_map; i++ ) { if ( (e820.map[i].type == E820_RAM) && (e820.map[i].addr <= p) && (e820.map[i].size > p) ) return 1; } return 0; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Jan-24 19:58 UTC
Re: [Xen-devel] Critical bug: VT-d fault causes disk corruption or Dom0 kernel panic.
On 24/01/2009 19:07, "Cihula, Joseph" <joseph.cihula@intel.com> wrote:> Since the tboot memory is marked as reserved or unusable, the tboot_in_range() > call is no longer needed. Do you want me to add that to my patch set?I''ll remove it.> You''ve used a call to memory_is_conventional_ram() to check whether the page > is in non-reserved RAM, but in looking at the function (below), I don''t see > how the ''(e820.map[i].size > p)'' test is valid--shouldn''t it be > ''((e820.map[i].addr + e820.map[i].size) > p)''?Yeah, that makes no sense. I''ll fix it. Thanks. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Apparently Analagous Threads
- [VTD][PATCH] a time out mechanism for the shared interrupt issue for vtd
- [bug] ''VT-d 1G super page'' feature is blocked
- [VTD][RESEND]add a timer for the shared interrupt issue for vt-d
- [PATCH] unshadow the page table page which are used as data page
- [VTD][patch 0/5] HVM device assignment using vt-d