Daniel Kiper
2011-Apr-22 21:25 UTC
[Xen-devel] [PATCH REPOST] pv-grub: Fix for incorrect dom->p2m_host[] list initialization
Added missed Signed-off-by line. After a lot of debugging and long reading of Linux Kernel and Xen code finally I killed deeply hidden bug in pv-grub. Details below. Additionally, I am CC''ing this e-mail to LKML because this issue looks like Linux Kernel problem, however, it is not. This patch applies to Xen Ver. 4.0, Xen Ver. 4.1 and unstable tree. # HG changeset patch # User dkiper@net-space.pl # Date 1303474763 -7200 # Node ID b33bf24be129b7b9cd2248460beb1298088c6af5 # Parent dbf2ddf652dc3dd927447e79ef4bc586de55d708 Introduction of Linux Kernel git commit ceefccc93932b920a8ec6f35f596db05202a12fe (x86: default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to 16 MB) revealed deeply hidden bug in pv-grub. During kernel load stage dom->p2m_host[] list has been incorrectly initialized. At the beginning of kernel load stage dom->p2m_host[] list is populated with current pfn->mfn layout. Later during memory allocation (memory is allocated page by page in kexec_allocate()) page order is changed to establish linear layout in new domain. It is done by exchanging subsequent mfns with newly allocated mfns. dom->p2m_host[] list is indexed by currently requested pfn (it is incremented from 0) and pfn of newly allocated paged. If pfn of newly allocated page is less than currently requested pfn then relevant earlier allocated mfn is overwritten which leads to domain crash later. This patch fix that issue. If pfn of newly allocated page is less then currently requested pfn then relevant pfn/mfn pair is properly calculated and usual exchange occurs later. Signed-off-by: Daniel Kiper <dkiper@net-space.pl> diff -r dbf2ddf652dc -r b33bf24be129 stubdom/grub/kexec.c --- a/stubdom/grub/kexec.c Thu Apr 07 15:26:58 2011 +0100 +++ b/stubdom/grub/kexec.c Fri Apr 22 14:19:23 2011 +0200 @@ -91,6 +91,11 @@ int kexec_allocate(struct xc_dom_image * new_pfn = PHYS_PFN(to_phys(pages[i])); pages_mfns[i] = new_mfn = pfn_to_mfn(new_pfn); + if (new_pfn < i) + for (new_pfn = i; new_pfn < dom->total_pages; ++new_pfn) + if (dom->p2m_host[new_pfn] == new_mfn) + break; + /* Put old page at new PFN */ dom->p2m_host[new_pfn] = old_mfn; Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Samuel Thibault
2011-Apr-22 22:33 UTC
Re: [Xen-devel] [PATCH REPOST] pv-grub: Fix for incorrect dom->p2m_host[] list initialization
Hello, Daniel Kiper, le Fri 22 Apr 2011 23:25:45 +0200, a écrit :> If pfn of newly allocated page is less than currently requested pfn > then relevant earlier allocated mfn is overwritten which leads to > domain crash later.Oops, good catch! And unfortunately it happens seldomly... I guess it may be the culprit for a fair number of other issues.> + if (new_pfn < i) > + for (new_pfn = i; new_pfn < dom->total_pages; ++new_pfn) > + if (dom->p2m_host[new_pfn] == new_mfn) > + break;Instead of looking for the page, which takes a linear time for each page and thus potentially quadratic time, we should probably rather record which PFN the MFNs < allocated have been moved to? Samuel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-Apr-26 13:42 UTC
[Xen-devel] Re: [PATCH REPOST] pv-grub: Fix for incorrect dom->p2m_host[] list initialization
On Fri, Apr 22, 2011 at 11:25:45PM +0200, Daniel Kiper wrote:> Added missed Signed-off-by line. > > After a lot of debugging and long reading of Linux Kernel and Xen code > finally I killed deeply hidden bug in pv-grub. Details below. > Additionally, I am CC''ing this e-mail to LKML because this issue > looks like Linux Kernel problem, however, it is not. > > This patch applies to Xen Ver. 4.0, Xen Ver. 4.1 and unstable tree. > > # HG changeset patch > # User dkiper@net-space.pl > # Date 1303474763 -7200 > # Node ID b33bf24be129b7b9cd2248460beb1298088c6af5 > # Parent dbf2ddf652dc3dd927447e79ef4bc586de55d708 > Introduction of Linux Kernel git commit ceefccc93932b920a8ec6f35f596db05202a12fe > (x86: default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to 16 MB) revealed > deeply hidden bug in pv-grub. During kernel load stage dom->p2m_host[] list has > been incorrectly initialized. > > At the beginning of kernel load stage dom->p2m_host[] list is populated with > current pfn->mfn layout. Later during memory allocation (memory is allocated > page by page in kexec_allocate()) page order is changed to establish linear > layout in new domain. It is done by exchanging subsequent mfns with newly > allocated mfns. dom->p2m_host[] list is indexed by currently requested pfn > (it is incremented from 0) and pfn of newly allocated paged. If pfn of newly > allocated page is less than currently requested pfn then relevant earlier > allocated mfn is overwritten which leads to domain crash later. This patch > fix that issue. If pfn of newly allocated page is less then currently > requested pfn then relevant pfn/mfn pair is properly calculated and usual > exchange occurs later.Nice! I presume this fixes the issue you had at the Xen Hack-O-Thon with your guest right?> > Signed-off-by: Daniel Kiper <dkiper@net-space.pl> > > diff -r dbf2ddf652dc -r b33bf24be129 stubdom/grub/kexec.c > --- a/stubdom/grub/kexec.c Thu Apr 07 15:26:58 2011 +0100 > +++ b/stubdom/grub/kexec.c Fri Apr 22 14:19:23 2011 +0200 > @@ -91,6 +91,11 @@ int kexec_allocate(struct xc_dom_image * > new_pfn = PHYS_PFN(to_phys(pages[i])); > pages_mfns[i] = new_mfn = pfn_to_mfn(new_pfn); > > + if (new_pfn < i) > + for (new_pfn = i; new_pfn < dom->total_pages; ++new_pfn) > + if (dom->p2m_host[new_pfn] == new_mfn) > + break; > + > /* Put old page at new PFN */ > dom->p2m_host[new_pfn] = old_mfn; > > Daniel > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Kiper
2011-Apr-26 14:25 UTC
Re: [Xen-devel] [PATCH REPOST] pv-grub: Fix for incorrect dom->p2m_host[] list initialization
On Sat, Apr 23, 2011 at 12:33:32AM +0200, Samuel Thibault wrote:> Hello, > > Daniel Kiper, le Fri 22 Apr 2011 23:25:45 +0200, a ?crit : > > If pfn of newly allocated page is less than currently requested pfn > > then relevant earlier allocated mfn is overwritten which leads to > > domain crash later. > > Oops, good catch! And unfortunately it happens seldomly... I guess it > may be the culprit for a fair number of other issues.I discovered that issue on domU i386. It does not affect x86_64 in my environment. However, as you stated above that issue in some circumstances could lead to mysterious system crashes or data corruptions.> > + if (new_pfn < i) > > + for (new_pfn = i; new_pfn < dom->total_pages; ++new_pfn) > > + if (dom->p2m_host[new_pfn] == new_mfn) > > + break; > > Instead of looking for the page, which takes a linear time for each page > and thus potentially quadratic time, we should probably rather record > which PFN the MFNs < allocated have been moved to?I am going to post new time optimized version of that patch today or tommorow. Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Kiper
2011-Apr-26 14:34 UTC
[Xen-devel] Re: [PATCH REPOST] pv-grub: Fix for incorrect dom->p2m_host[] list initialization
On Tue, Apr 26, 2011 at 09:42:42AM -0400, Konrad Rzeszutek Wilk wrote:> On Fri, Apr 22, 2011 at 11:25:45PM +0200, Daniel Kiper wrote: > > Added missed Signed-off-by line. > > > > After a lot of debugging and long reading of Linux Kernel and Xen code > > finally I killed deeply hidden bug in pv-grub. Details below. > > Additionally, I am CC''ing this e-mail to LKML because this issue > > looks like Linux Kernel problem, however, it is not. > > > > This patch applies to Xen Ver. 4.0, Xen Ver. 4.1 and unstable tree. > > > > # HG changeset patch > > # User dkiper@net-space.pl > > # Date 1303474763 -7200 > > # Node ID b33bf24be129b7b9cd2248460beb1298088c6af5 > > # Parent dbf2ddf652dc3dd927447e79ef4bc586de55d708 > > Introduction of Linux Kernel git commit ceefccc93932b920a8ec6f35f596db05202a12fe > > (x86: default CONFIG_PHYSICAL_START and CONFIG_PHYSICAL_ALIGN to 16 MB) revealed > > deeply hidden bug in pv-grub. During kernel load stage dom->p2m_host[] list has > > been incorrectly initialized. > > > > At the beginning of kernel load stage dom->p2m_host[] list is populated with > > current pfn->mfn layout. Later during memory allocation (memory is allocated > > page by page in kexec_allocate()) page order is changed to establish linear > > layout in new domain. It is done by exchanging subsequent mfns with newly > > allocated mfns. dom->p2m_host[] list is indexed by currently requested pfn > > (it is incremented from 0) and pfn of newly allocated paged. If pfn of newly > > allocated page is less than currently requested pfn then relevant earlier > > allocated mfn is overwritten which leads to domain crash later. This patch > > fix that issue. If pfn of newly allocated page is less then currently > > requested pfn then relevant pfn/mfn pair is properly calculated and usual > > exchange occurs later. > > Nice! I presume this fixes the issue you had at the Xen Hack-O-Thon with > your guest right?Yes, it does. It was very difficult to discover because that issue overlapped with other memory management issues which were coming out last time. Currently, I am working on time optimized version of that patch. Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel