Gianni Tedesco
2011-Mar-11 18:52 UTC
[Xen-devel] xc: error: xc_machphys_mfn_list: 83 != 129 when suspending 32GB PV DomU
Hi, I have a 256GB host and run a 32GB 64-bit PV domain (SLES 11) on it. When I try and suspend the domain, xc barfs with: xc: error: xc_machphys_mfn_list: 83 != 129: Internal error xc: error: xc_get_m2p_mfns (0 = Success): Internal error xc: error: Failed to map live M2P table (0 = Success): Internal error At first, since dom0 is 32 bit, I suspected the compat layer. However the hypercall in xen/arch/x86/x86_64/compat/mm.c compat_arch_memory_op() seems to agree with the numbers: (XEN) compat_arch_memory_op returned 0 (nr_extents = 83, max_extents = 129)>From this I conclude that everything is working OK at the hypercalllayer. However, looking at the code in compat_arch_memory_op() it appears that the code is failing due to some arcane limits of the compat subsystem. The following code to establish the variable ''limit'' is causing the loop to exit early: limit = (unsigned long)(compat_machine_to_phys_mapping + min_t(unsigned long, max_page, MACH2PHYS_COMPAT_NR_ENTRIES(current->domain))); if ( limit > RDWR_COMPAT_MPT_VIRT_END ) limit = RDWR_COMPAT_MPT_VIRT_END; for ( i = 0, v = RDWR_COMPAT_MPT_VIRT_START, last_mfn = 0; (i != xmml.max_extents) && (v < limit); i++, v += 1 << L2_PAGETABLE_SHIFT ) { /* do stuff */ } xmml.nr_extents = i; Further debugging reveals the variables are set as such: (XEN) compat_machine_to_phys_mapping = 18446606377058041856 (XEN) max_page = 67272704 (XEN) MACH2PHYS_COMPAT_NR_ENTRIES(current->domain) = 43515904 (XEN) RDWR_COMPAT_MPT_VIRT_START = 18446606377058041856 (XEN) RDWR_COMPAT_MPT_VIRT_END = 18446606378131783680 (XEN) limit = 18446606377232105472, (1 << L2_PAGETABLE_SHIFT) = 2097152 Could it be that the compat mach-to-phys conversion table size of 1GB is too small? Or that there exists some other arbitrary limit on the size of domains that can be suspended [when using a 32bit dom0] ? Gianni _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Mar-11 19:21 UTC
[Xen-devel] Re: xc: error: xc_machphys_mfn_list: 83 != 129 when suspending 32GB PV DomU
On 11/03/2011 18:52, "Gianni Tedesco" <gianni.tedesco@citrix.com> wrote:> Further debugging reveals the variables are set as such: > (XEN) compat_machine_to_phys_mapping = 18446606377058041856 > (XEN) max_page = 67272704 > (XEN) MACH2PHYS_COMPAT_NR_ENTRIES(current->domain) = 43515904 > (XEN) RDWR_COMPAT_MPT_VIRT_START = 18446606377058041856 > (XEN) RDWR_COMPAT_MPT_VIRT_END = 18446606378131783680 > (XEN) limit = 18446606377232105472, (1 << L2_PAGETABLE_SHIFT) = 2097152 > > Could it be that the compat mach-to-phys conversion table size of 1GB is > too small?It is insufficient to cover all of the system''s memory. The reason for the limit is that a 1GB M2P table is all that is reasonable to map into a 32-bit domain''s address space while still leaving space for the guest''s own mappings. We could make the compat m2p bigger solely for mapping by dom0 when doing save/restore? However, you''d likely still soon hit the mmap limit for the save/restore process in dom0. It might extend the lifetime of your 32-bit dom0 for a bit longer however. Long enough for the lightweight HVM container for PV guests work to get checked in and make our PV 64-bit guest performance better. -- Keir> Or that there exists some other arbitrary limit on the size > of domains that can be suspended [when using a 32bit dom0] ?_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Mar-14 10:20 UTC
Re: [Xen-devel] Re: xc: error: xc_machphys_mfn_list: 83 != 129 when suspending 32GB PV DomU
On Fri, 2011-03-11 at 19:21 +0000, Keir Fraser wrote:> On 11/03/2011 18:52, "Gianni Tedesco" <gianni.tedesco@citrix.com> wrote: > > > Further debugging reveals the variables are set as such: > > (XEN) compat_machine_to_phys_mapping = 18446606377058041856 > > (XEN) max_page = 67272704 > > (XEN) MACH2PHYS_COMPAT_NR_ENTRIES(current->domain) = 43515904 > > (XEN) RDWR_COMPAT_MPT_VIRT_START = 18446606377058041856 > > (XEN) RDWR_COMPAT_MPT_VIRT_END = 18446606378131783680 > > (XEN) limit = 18446606377232105472, (1 << L2_PAGETABLE_SHIFT) = 2097152 > > > > Could it be that the compat mach-to-phys conversion table size of 1GB is > > too small? > > It is insufficient to cover all of the system''s memory. The reason for the > limit is that a 1GB M2P table is all that is reasonable to map into a 32-bit > domain''s address space while still leaving space for the guest''s own > mappings.The compat M2P actually mapped into the guest isn''t 1GB, 1GB would be the entire kernel mapping with no room for anything else. Also 1GB of M2P is enough to cover 1TB of host memory so I don''t think it''s too small at the moment. Is the limit here not MACH2PHYS_COMPAT_NR_ENTRIES? (in the above limit == compat_machine_to_phys_mapping + ~160M) IIRC the size of the M2P which is mapped into a PAE guest is normally capped at ~160M (the total size of the hypervisor hole for a PAE guest running on a PAE hypervisor). 160M is enough M2P for 160G of host address space which would explain why this is seen on a 256GB host but not a 128GB one. The limit on the size of the M2P is adjustable, in particular for dom0 I think it would be reasonable to allow it to expand to, e.g. 256M, without too much cause for concern. Obviously this hole eats into the 1GB kernel mapping so you don''t want it to grow too much bigger and long run something better would be needed but this would probably allow you to support 256GB without too much trouble in the short term, other than slightly reducing the amount of lowmem the system sees (which might be an issue if you''ve chosen dom0_mem on that basis...) The lower limit is set by the kernel in its XEN_ELFNOTE_HV_START_LOW ELF note (set in arch/x86/kernel/head_32-xen.S), which is picked up in xen/arch/x86/build_domain.c:construct_dom0(). NB: This might be the first time this functionality has been used in anger to increase the M2P space (I think it is actively used to shrink it on hosts with <160G). Another alternative, which would allow large hosts without needing to expand the dom0 M2P, would be to provide interfaces that allow the tools to map specific portions of the host M2P so the tools can build themselves a mapcache style thing. The M2P space which needs to be accessed to perform a migration of an individual guest is likely going to be smaller than the total host RAM so even using 256M-512M of guest user-mode address space (allowing for 256GB-512GB of host address space) would likely allow you to map the bits you need without excessive churn (aka performance hit) in the mapping. A given userspace process has 3G of address space to play with so it can take the hit of increasing the M2P mapcache size far easier than the kernel can. Hrm, maybe you don''t even need a map cache thing -- just a way to allow a userspace process to map more M2P than the kernel can... (which might be as simple as removing the limit clamp based on MACH2PHYS_COMPAT_NR_ENTRIES in the compat layer?) Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianni Tedesco
2011-Mar-14 15:05 UTC
[Xen-devel] [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
This permits suspend/resume to work with 32bit dom0/tools. AFAICT the limit to MACH2PHYS_COMPAT_NR_ENTRIES is redundant since that refers to a limit in kernel mappings under 32bit hypervisors, not userspace where there may be gigabytes of useful virtual space for this. Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com> diff -r cf558cb8b92b xen/arch/x86/x86_64/compat/mm.c --- a/xen/arch/x86/x86_64/compat/mm.c Mon Mar 07 17:52:44 2011 +0000 +++ b/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 14:58:04 2011 +0000 @@ -162,8 +162,7 @@ int compat_arch_memory_op(int op, XEN_GU return -EFAULT; limit = (unsigned long)(compat_machine_to_phys_mapping + - min_t(unsigned long, max_page, - MACH2PHYS_COMPAT_NR_ENTRIES(current->domain))); + (unsigned long)max_page); if ( limit > RDWR_COMPAT_MPT_VIRT_END ) limit = RDWR_COMPAT_MPT_VIRT_END; for ( i = 0, v = RDWR_COMPAT_MPT_VIRT_START, last_mfn = 0; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Mar-14 15:08 UTC
[Xen-devel] Re: [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
Cc''ing Jan. I''d appreciate his Ack as this is his code originally. I don''t think we''re missing anything subtle though. -- Keir On 14/03/2011 15:05, "Gianni Tedesco" <gianni.tedesco@citrix.com> wrote:> This permits suspend/resume to work with 32bit dom0/tools. AFAICT the > limit to MACH2PHYS_COMPAT_NR_ENTRIES is redundant since that refers to a > limit in kernel mappings under 32bit hypervisors, not userspace where > there may be gigabytes of useful virtual space for this. > > Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com> > Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com> > > diff -r cf558cb8b92b xen/arch/x86/x86_64/compat/mm.c > --- a/xen/arch/x86/x86_64/compat/mm.c Mon Mar 07 17:52:44 2011 +0000 > +++ b/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 14:58:04 2011 +0000 > @@ -162,8 +162,7 @@ int compat_arch_memory_op(int op, XEN_GU > return -EFAULT; > > limit = (unsigned long)(compat_machine_to_phys_mapping + > - min_t(unsigned long, max_page, > - MACH2PHYS_COMPAT_NR_ENTRIES(current->domain))); > + (unsigned long)max_page); > if ( limit > RDWR_COMPAT_MPT_VIRT_END ) > limit = RDWR_COMPAT_MPT_VIRT_END; > for ( i = 0, v = RDWR_COMPAT_MPT_VIRT_START, last_mfn = 0; > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Mar-14 15:11 UTC
[Xen-devel] Re: [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
On Mon, 2011-03-14 at 15:05 +0000, Gianni Tedesco wrote:> This permits suspend/resume to work with 32bit dom0/tools. AFAICT the > limit to MACH2PHYS_COMPAT_NR_ENTRIES is redundant since that refers to a > limit in kernel mappings under 32bit hypervisors,32 bit guest on 64 bit h/v, but yes, it refers only to the size of the mapping of the compat M2P which the hypervisor provides for the guest in the "hypervisor hole".> not userspace where > there may be gigabytes of useful virtual space for this.Agreed, and in any case if the guest/tools wants to ask for more mappings than it can cope with, that''s its own problem...> Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com> > Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com> > > diff -r cf558cb8b92b xen/arch/x86/x86_64/compat/mm.c > --- a/xen/arch/x86/x86_64/compat/mm.c Mon Mar 07 17:52:44 2011 +0000 > +++ b/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 14:58:04 2011 +0000 > @@ -162,8 +162,7 @@ int compat_arch_memory_op(int op, XEN_GU > return -EFAULT; > > limit = (unsigned long)(compat_machine_to_phys_mapping + > - min_t(unsigned long, max_page, > - MACH2PHYS_COMPAT_NR_ENTRIES(current->domain))); > + (unsigned long)max_page);max_page is already unsigned long, so only the overall expression needs casting (since compat_machine_to_phys_mapping is an int *), right? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianni Tedesco
2011-Mar-14 15:19 UTC
[Xen-devel] Re: [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
On Mon, 2011-03-14 at 15:11 +0000, Ian Campbell wrote:> On Mon, 2011-03-14 at 15:05 +0000, Gianni Tedesco wrote: > > This permits suspend/resume to work with 32bit dom0/tools. AFAICT the > > limit to MACH2PHYS_COMPAT_NR_ENTRIES is redundant since that refers to a > > limit in kernel mappings under 32bit hypervisors, > > 32 bit guest on 64 bit h/v, but yes, it refers only to the size of the > mapping of the compat M2P which the hypervisor provides for the guest in > the "hypervisor hole".Ah, thanks for clarification.> > not userspace where > > there may be gigabytes of useful virtual space for this. > > Agreed, and in any case if the guest/tools wants to ask for more > mappings than it can cope with, that''s its own problem... > > > Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com> > > Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com> > > > > diff -r cf558cb8b92b xen/arch/x86/x86_64/compat/mm.c > > --- a/xen/arch/x86/x86_64/compat/mm.c Mon Mar 07 17:52:44 2011 +0000 > > +++ b/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 14:58:04 2011 +0000 > > @@ -162,8 +162,7 @@ int compat_arch_memory_op(int op, XEN_GU > > return -EFAULT; > > > > limit = (unsigned long)(compat_machine_to_phys_mapping + > > - min_t(unsigned long, max_page, > > - MACH2PHYS_COMPAT_NR_ENTRIES(current->domain))); > > + (unsigned long)max_page); > > max_page is already unsigned long, so only the overall expression needs > casting (since compat_machine_to_phys_mapping is an int *), right?Right you are: -- This permits suspend/resume to work with 32bit dom0/tools. AFAICT the limit to MACH2PHYS_COMPAT_NR_ENTRIES is redundant since that refers to a limit in 32bit guest compat mappings under 64bit hypervisors, not userspace where there may be gigabytes of useful virtual space available for this. Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com> diff -r 8b5cbccbc654 xen/arch/x86/x86_64/compat/mm.c --- a/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 14:59:27 2011 +0000 +++ b/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 15:17:59 2011 +0000 @@ -161,9 +161,7 @@ int compat_arch_memory_op(int op, XEN_GU if ( copy_from_guest(&xmml, arg, 1) ) return -EFAULT; - limit = (unsigned long)(compat_machine_to_phys_mapping + - min_t(unsigned long, max_page, - MACH2PHYS_COMPAT_NR_ENTRIES(current->domain))); + limit = (unsigned long)(compat_machine_to_phys_mapping + max_page); if ( limit > RDWR_COMPAT_MPT_VIRT_END ) limit = RDWR_COMPAT_MPT_VIRT_END; for ( i = 0, v = RDWR_COMPAT_MPT_VIRT_START, last_mfn = 0; _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Mar-14 15:55 UTC
[Xen-devel] Re: [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
>>> On 14.03.11 at 16:19, Gianni Tedesco <gianni.tedesco@citrix.com> wrote: > > This permits suspend/resume to work with 32bit dom0/tools. AFAICT the > limit to MACH2PHYS_COMPAT_NR_ENTRIES is redundant since that refers to a > limit in 32bit guest compat mappings under 64bit hypervisors, not > userspace where there may be gigabytes of useful virtual space available > for this. > > Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com> > Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com> > > diff -r 8b5cbccbc654 xen/arch/x86/x86_64/compat/mm.c > --- a/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 14:59:27 2011 +0000 > +++ b/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 15:17:59 2011 +0000 > @@ -161,9 +161,7 @@ int compat_arch_memory_op(int op, XEN_GU > if ( copy_from_guest(&xmml, arg, 1) ) > return -EFAULT; > > - limit = (unsigned long)(compat_machine_to_phys_mapping + > - min_t(unsigned long, max_page, > - MACH2PHYS_COMPAT_NR_ENTRIES(current->domain))); > + limit = (unsigned long)(compat_machine_to_phys_mapping + max_page);While doing this shouldn''t hurt (except slightly for performance of the hypercall), I don''t see why it''s useful: For slots past MACH2PHYS_COMPAT_NR_ENTRIES(current->domain) you wouldn''t read non-null page table entries anyway (up to RDWR_COMPAT_MPT_VIRT_END), so I don''t see why the tools couldn''t equally well do with what we have currently (after all they get told how many slots were filled). Jan> if ( limit > RDWR_COMPAT_MPT_VIRT_END ) > limit = RDWR_COMPAT_MPT_VIRT_END; > for ( i = 0, v = RDWR_COMPAT_MPT_VIRT_START, last_mfn = 0;_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Mar-14 16:03 UTC
Re: [Xen-devel] Re: [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
On Mon, 2011-03-14 at 15:55 +0000, Jan Beulich wrote:> >>> On 14.03.11 at 16:19, Gianni Tedesco <gianni.tedesco@citrix.com> wrote: > > > > This permits suspend/resume to work with 32bit dom0/tools. AFAICT the > > limit to MACH2PHYS_COMPAT_NR_ENTRIES is redundant since that refers to a > > limit in 32bit guest compat mappings under 64bit hypervisors, not > > userspace where there may be gigabytes of useful virtual space available > > for this. > > > > Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com> > > Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com> > > > > diff -r 8b5cbccbc654 xen/arch/x86/x86_64/compat/mm.c > > --- a/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 14:59:27 2011 +0000 > > +++ b/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 15:17:59 2011 +0000 > > @@ -161,9 +161,7 @@ int compat_arch_memory_op(int op, XEN_GU > > if ( copy_from_guest(&xmml, arg, 1) ) > > return -EFAULT; > > > > - limit = (unsigned long)(compat_machine_to_phys_mapping + > > - min_t(unsigned long, max_page, > > - MACH2PHYS_COMPAT_NR_ENTRIES(current->domain))); > > + limit = (unsigned long)(compat_machine_to_phys_mapping + max_page); > > While doing this shouldn''t hurt (except slightly for performance of > the hypercall), I don''t see why it''s useful: For slots past > MACH2PHYS_COMPAT_NR_ENTRIES(current->domain) you > wouldn''t read non-null page table entries anyway (up to > RDWR_COMPAT_MPT_VIRT_END), so I don''t see why the tools > couldn''t equally well do with what we have currently (after all > they get told how many slots were filled).In order to be able to migrate any guest the tools in domain 0 need to see the entire of host M2P, not just the subset which the kernel sees mapped into its hypervisor hole (which is what MACH2PHYS_COMPAT_NR_ENTRIES represents). The hypercall reads from the global compat M2P mapping, not the guest kernel mapping of it, so it should read valid entries all the way up to RDWR_COMPAT_MPT_VIRT_END, AFAICT. Ian.> > Jan > > > if ( limit > RDWR_COMPAT_MPT_VIRT_END ) > > limit = RDWR_COMPAT_MPT_VIRT_END; > > for ( i = 0, v = RDWR_COMPAT_MPT_VIRT_START, last_mfn = 0; > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Mar-14 16:22 UTC
Re: [Xen-devel] Re: [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
>>> On 14.03.11 at 17:03, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: > On Mon, 2011-03-14 at 15:55 +0000, Jan Beulich wrote: >> >>> On 14.03.11 at 16:19, Gianni Tedesco <gianni.tedesco@citrix.com> wrote: >> > >> > This permits suspend/resume to work with 32bit dom0/tools. AFAICT the >> > limit to MACH2PHYS_COMPAT_NR_ENTRIES is redundant since that refers to a >> > limit in 32bit guest compat mappings under 64bit hypervisors, not >> > userspace where there may be gigabytes of useful virtual space available >> > for this. >> > >> > Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com> >> > Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com> >> > >> > diff -r 8b5cbccbc654 xen/arch/x86/x86_64/compat/mm.c >> > --- a/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 14:59:27 2011 +0000 >> > +++ b/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 15:17:59 2011 +0000 >> > @@ -161,9 +161,7 @@ int compat_arch_memory_op(int op, XEN_GU >> > if ( copy_from_guest(&xmml, arg, 1) ) >> > return -EFAULT; >> > >> > - limit = (unsigned long)(compat_machine_to_phys_mapping + >> > - min_t(unsigned long, max_page, >> > - MACH2PHYS_COMPAT_NR_ENTRIES(current->domain))); >> > + limit = (unsigned long)(compat_machine_to_phys_mapping + > max_page); >> >> While doing this shouldn''t hurt (except slightly for performance of >> the hypercall), I don''t see why it''s useful: For slots past >> MACH2PHYS_COMPAT_NR_ENTRIES(current->domain) you >> wouldn''t read non-null page table entries anyway (up to >> RDWR_COMPAT_MPT_VIRT_END), so I don''t see why the tools >> couldn''t equally well do with what we have currently (after all >> they get told how many slots were filled). > > In order to be able to migrate any guest the tools in domain 0 need to > see the entire of host M2P, not just the subset which the kernel sees > mapped into its hypervisor hole (which is what > MACH2PHYS_COMPAT_NR_ENTRIES represents). > > The hypercall reads from the global compat M2P mapping, not the guest > kernel mapping of it, so it should read valid entries all the way up to > RDWR_COMPAT_MPT_VIRT_END, AFAICT.But RDWR_COMPAT_MPT_VIRT_END still doesn''t necessarily cover all of the memory the machine may have (after all the range is way smaller than RDWR_MPT_VIRT_{START,END}. If that''s the goal, then the patch as presented isn''t suitable, as there''s not event a compat table set up for all of the memory. I''d say the tools then need to have access to the native table, reading 64-bit MFNs from it (since, with MFN compression, we can exceed 32-bits). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Mar-14 16:33 UTC
Re: [Xen-devel] Re: [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
On Mon, 2011-03-14 at 16:22 +0000, Jan Beulich wrote:> >>> On 14.03.11 at 17:03, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: > > On Mon, 2011-03-14 at 15:55 +0000, Jan Beulich wrote: > >> >>> On 14.03.11 at 16:19, Gianni Tedesco <gianni.tedesco@citrix.com> wrote: > >> > > >> > This permits suspend/resume to work with 32bit dom0/tools. AFAICT the > >> > limit to MACH2PHYS_COMPAT_NR_ENTRIES is redundant since that refers to a > >> > limit in 32bit guest compat mappings under 64bit hypervisors, not > >> > userspace where there may be gigabytes of useful virtual space available > >> > for this. > >> > > >> > Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com> > >> > Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com> > >> > > >> > diff -r 8b5cbccbc654 xen/arch/x86/x86_64/compat/mm.c > >> > --- a/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 14:59:27 2011 +0000 > >> > +++ b/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 15:17:59 2011 +0000 > >> > @@ -161,9 +161,7 @@ int compat_arch_memory_op(int op, XEN_GU > >> > if ( copy_from_guest(&xmml, arg, 1) ) > >> > return -EFAULT; > >> > > >> > - limit = (unsigned long)(compat_machine_to_phys_mapping + > >> > - min_t(unsigned long, max_page, > >> > - MACH2PHYS_COMPAT_NR_ENTRIES(current->domain))); > >> > + limit = (unsigned long)(compat_machine_to_phys_mapping + > > max_page); > >> > >> While doing this shouldn''t hurt (except slightly for performance of > >> the hypercall), I don''t see why it''s useful: For slots past > >> MACH2PHYS_COMPAT_NR_ENTRIES(current->domain) you > >> wouldn''t read non-null page table entries anyway (up to > >> RDWR_COMPAT_MPT_VIRT_END), so I don''t see why the tools > >> couldn''t equally well do with what we have currently (after all > >> they get told how many slots were filled). > > > > In order to be able to migrate any guest the tools in domain 0 need to > > see the entire of host M2P, not just the subset which the kernel sees > > mapped into its hypervisor hole (which is what > > MACH2PHYS_COMPAT_NR_ENTRIES represents). > > > > The hypercall reads from the global compat M2P mapping, not the guest > > kernel mapping of it, so it should read valid entries all the way up to > > RDWR_COMPAT_MPT_VIRT_END, AFAICT. > > But RDWR_COMPAT_MPT_VIRT_END still doesn''t necessarily > cover all of the memory the machine may have (after all the > range is way smaller than RDWR_MPT_VIRT_{START,END}.It''s 1GB which is enough to cover 1TB of host memory, which AFAIK is all we support these days. It certainly buys us time compared with currently failing at 160GB.> If that''s the goal, then the patch as presented isn''t suitable, > as there''s not event a compat table set up for all of the > memory.paging_init seems to do the right thing and setup the compat M2P up to a maximum of RDWR_COMPAT_MPT_VIRT_END.> I''d say the tools then need to have access to the > native table, reading 64-bit MFNs from it (since, with MFN > compression, we can exceed 32-bits).That''s another option I guess. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Mar-14 16:54 UTC
Re: [Xen-devel] Re: [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
On 14/03/2011 16:33, "Ian Campbell" <Ian.Campbell@eu.citrix.com> wrote:>> But RDWR_COMPAT_MPT_VIRT_END still doesn''t necessarily >> cover all of the memory the machine may have (after all the >> range is way smaller than RDWR_MPT_VIRT_{START,END}. > > It''s 1GB which is enough to cover 1TB of host memory, which AFAIK is all > we support these days. It certainly buys us time compared with currently > failing at 160GB. > >> If that''s the goal, then the patch as presented isn''t suitable, >> as there''s not event a compat table set up for all of the >> memory. > > paging_init seems to do the right thing and setup the compat M2P up to a > maximum of RDWR_COMPAT_MPT_VIRT_END. > >> I''d say the tools then need to have access to the >> native table, reading 64-bit MFNs from it (since, with MFN >> compression, we can exceed 32-bits). > > That''s another option I guess.It''s not really an option for 4.1.0. Can we at least agree that this is an improvement for now, and in time for 4.1.0? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Mar-14 16:58 UTC
Re: [Xen-devel] Re: [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
>>> On 14.03.11 at 17:33, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: > On Mon, 2011-03-14 at 16:22 +0000, Jan Beulich wrote: >> >>> On 14.03.11 at 17:03, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: >> > On Mon, 2011-03-14 at 15:55 +0000, Jan Beulich wrote: >> >> >>> On 14.03.11 at 16:19, Gianni Tedesco <gianni.tedesco@citrix.com> wrote: >> >> > >> >> > This permits suspend/resume to work with 32bit dom0/tools. AFAICT the >> >> > limit to MACH2PHYS_COMPAT_NR_ENTRIES is redundant since that refers to a >> >> > limit in 32bit guest compat mappings under 64bit hypervisors, not >> >> > userspace where there may be gigabytes of useful virtual space available >> >> > for this. >> >> > >> >> > Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com> >> >> > Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com> >> >> > >> >> > diff -r 8b5cbccbc654 xen/arch/x86/x86_64/compat/mm.c >> >> > --- a/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 14:59:27 2011 +0000 >> >> > +++ b/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 15:17:59 2011 +0000 >> >> > @@ -161,9 +161,7 @@ int compat_arch_memory_op(int op, XEN_GU >> >> > if ( copy_from_guest(&xmml, arg, 1) ) >> >> > return -EFAULT; >> >> > >> >> > - limit = (unsigned long)(compat_machine_to_phys_mapping + >> >> > - min_t(unsigned long, max_page, >> >> > - MACH2PHYS_COMPAT_NR_ENTRIES(current->domain))); >> >> > + limit = (unsigned long)(compat_machine_to_phys_mapping + >> > max_page); >> >> >> >> While doing this shouldn''t hurt (except slightly for performance of >> >> the hypercall), I don''t see why it''s useful: For slots past >> >> MACH2PHYS_COMPAT_NR_ENTRIES(current->domain) you >> >> wouldn''t read non-null page table entries anyway (up to >> >> RDWR_COMPAT_MPT_VIRT_END), so I don''t see why the tools >> >> couldn''t equally well do with what we have currently (after all >> >> they get told how many slots were filled). >> > >> > In order to be able to migrate any guest the tools in domain 0 need to >> > see the entire of host M2P, not just the subset which the kernel sees >> > mapped into its hypervisor hole (which is what >> > MACH2PHYS_COMPAT_NR_ENTRIES represents). >> > >> > The hypercall reads from the global compat M2P mapping, not the guest >> > kernel mapping of it, so it should read valid entries all the way up to >> > RDWR_COMPAT_MPT_VIRT_END, AFAICT. >> >> But RDWR_COMPAT_MPT_VIRT_END still doesn''t necessarily >> cover all of the memory the machine may have (after all the >> range is way smaller than RDWR_MPT_VIRT_{START,END}. > > It''s 1GB which is enough to cover 1TB of host memory, which AFAIK is all > we support these days. It certainly buys us time compared with currently > failing at 160GB.1Tb of *contiguous* host memory. And that''s certainly not the limit Xen has been run on, and Xen itself is set up to handle 5Tb. Which I''m seeing to get exceeded on experimental(?) systems... And while I agree that failing at 1Tb is better than failing at 160Gb, I favor fixing this once and completely over doing a little bit of papering over the problem now just to require debugging the same issue again later.>> If that''s the goal, then the patch as presented isn''t suitable, >> as there''s not event a compat table set up for all of the >> memory. > > paging_init seems to do the right thing and setup the compat M2P up to a > maximum of RDWR_COMPAT_MPT_VIRT_END.With 1Gb being the theoretical limit of what a 32-bit guest can see and access, that''s all a guest could ever sensibly ask for (a [hypothetical] domain could ask for having a larger than the default hole with more of the table mapped in). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Mar-14 17:00 UTC
Re: [Xen-devel] Re: [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
>>> On 14.03.11 at 17:54, Keir Fraser <keir.xen@gmail.com> wrote: > On 14/03/2011 16:33, "Ian Campbell" <Ian.Campbell@eu.citrix.com> wrote: > >>> But RDWR_COMPAT_MPT_VIRT_END still doesn''t necessarily >>> cover all of the memory the machine may have (after all the >>> range is way smaller than RDWR_MPT_VIRT_{START,END}. >> >> It''s 1GB which is enough to cover 1TB of host memory, which AFAIK is all >> we support these days. It certainly buys us time compared with currently >> failing at 160GB. >> >>> If that''s the goal, then the patch as presented isn''t suitable, >>> as there''s not event a compat table set up for all of the >>> memory. >> >> paging_init seems to do the right thing and setup the compat M2P up to a >> maximum of RDWR_COMPAT_MPT_VIRT_END. >> >>> I''d say the tools then need to have access to the >>> native table, reading 64-bit MFNs from it (since, with MFN >>> compression, we can exceed 32-bits). >> >> That''s another option I guess. > > It''s not really an option for 4.1.0. Can we at least agree that this is an > improvement for now, and in time for 4.1.0?Yes, as long as the tools can handle the extended output. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianni Tedesco
2011-Mar-14 17:09 UTC
Re: [Xen-devel] Re: [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
On Mon, 2011-03-14 at 17:00 +0000, Jan Beulich wrote:> >>> On 14.03.11 at 17:54, Keir Fraser <keir.xen@gmail.com> wrote: > > On 14/03/2011 16:33, "Ian Campbell" <Ian.Campbell@eu.citrix.com> wrote: > > > >>> But RDWR_COMPAT_MPT_VIRT_END still doesn''t necessarily > >>> cover all of the memory the machine may have (after all the > >>> range is way smaller than RDWR_MPT_VIRT_{START,END}. > >> > >> It''s 1GB which is enough to cover 1TB of host memory, which AFAIK is all > >> we support these days. It certainly buys us time compared with currently > >> failing at 160GB. > >> > >>> If that''s the goal, then the patch as presented isn''t suitable, > >>> as there''s not event a compat table set up for all of the > >>> memory. > >> > >> paging_init seems to do the right thing and setup the compat M2P up to a > >> maximum of RDWR_COMPAT_MPT_VIRT_END. > >> > >>> I''d say the tools then need to have access to the > >>> native table, reading 64-bit MFNs from it (since, with MFN > >>> compression, we can exceed 32-bits). > >> > >> That''s another option I guess. > > > > It''s not really an option for 4.1.0. Can we at least agree that this is an > > improvement for now, and in time for 4.1.0? > > Yes, as long as the tools can handle the extended output.It certainly seems to work for the domain suspend case, I''m not sure where else it might fail but it-seems-to-me(tm) that tools should be able to handle the output that they asked for. Gianni _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gianni Tedesco
2011-Mar-14 17:11 UTC
Re: [Xen-devel] Re: [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
On Mon, 2011-03-14 at 16:58 +0000, Jan Beulich wrote:> > > It''s 1GB which is enough to cover 1TB of host memory, which AFAIK is > all > > we support these days. It certainly buys us time compared with > currently > > failing at 160GB. > > 1Tb of *contiguous* host memory. And that''s certainly not the limit > Xen has been run on, and Xen itself is set up to handle 5Tb. Which > I''m seeing to get exceeded on experimental(?) systems... > > And while I agree that failing at 1Tb is better than failing at 160Gb, > I favor fixing this once and completely over doing a little bit of > papering over the problem now just to require debugging the same > issue again later.Maybe a comment or a printk() would be advisable here then in the mean time? Gianni _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Mar-15 15:00 UTC
Re: [Xen-devel] Re: [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
On Mon, 2011-03-14 at 16:58 +0000, Jan Beulich wrote:> >>> On 14.03.11 at 17:33, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: > > On Mon, 2011-03-14 at 16:22 +0000, Jan Beulich wrote: > >> >>> On 14.03.11 at 17:03, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: > >> > On Mon, 2011-03-14 at 15:55 +0000, Jan Beulich wrote: > >> >> >>> On 14.03.11 at 16:19, Gianni Tedesco <gianni.tedesco@citrix.com> wrote: > >> >> > > >> >> > This permits suspend/resume to work with 32bit dom0/tools. AFAICT the > >> >> > limit to MACH2PHYS_COMPAT_NR_ENTRIES is redundant since that refers to a > >> >> > limit in 32bit guest compat mappings under 64bit hypervisors, not > >> >> > userspace where there may be gigabytes of useful virtual space available > >> >> > for this. > >> >> > > >> >> > Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com> > >> >> > Signed-off-by: Gianni Tedesco <gianni.tedesco@citrix.com> > >> >> > > >> >> > diff -r 8b5cbccbc654 xen/arch/x86/x86_64/compat/mm.c > >> >> > --- a/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 14:59:27 2011 +0000 > >> >> > +++ b/xen/arch/x86/x86_64/compat/mm.c Mon Mar 14 15:17:59 2011 +0000 > >> >> > @@ -161,9 +161,7 @@ int compat_arch_memory_op(int op, XEN_GU > >> >> > if ( copy_from_guest(&xmml, arg, 1) ) > >> >> > return -EFAULT; > >> >> > > >> >> > - limit = (unsigned long)(compat_machine_to_phys_mapping + > >> >> > - min_t(unsigned long, max_page, > >> >> > - MACH2PHYS_COMPAT_NR_ENTRIES(current->domain))); > >> >> > + limit = (unsigned long)(compat_machine_to_phys_mapping + > >> > max_page); > >> >> > >> >> While doing this shouldn''t hurt (except slightly for performance of > >> >> the hypercall), I don''t see why it''s useful: For slots past > >> >> MACH2PHYS_COMPAT_NR_ENTRIES(current->domain) you > >> >> wouldn''t read non-null page table entries anyway (up to > >> >> RDWR_COMPAT_MPT_VIRT_END), so I don''t see why the tools > >> >> couldn''t equally well do with what we have currently (after all > >> >> they get told how many slots were filled). > >> > > >> > In order to be able to migrate any guest the tools in domain 0 need to > >> > see the entire of host M2P, not just the subset which the kernel sees > >> > mapped into its hypervisor hole (which is what > >> > MACH2PHYS_COMPAT_NR_ENTRIES represents). > >> > > >> > The hypercall reads from the global compat M2P mapping, not the guest > >> > kernel mapping of it, so it should read valid entries all the way up to > >> > RDWR_COMPAT_MPT_VIRT_END, AFAICT. > >> > >> But RDWR_COMPAT_MPT_VIRT_END still doesn''t necessarily > >> cover all of the memory the machine may have (after all the > >> range is way smaller than RDWR_MPT_VIRT_{START,END}. > > > > It''s 1GB which is enough to cover 1TB of host memory, which AFAIK is all > > we support these days. It certainly buys us time compared with currently > > failing at 160GB. > > 1Tb of *contiguous* host memory. And that''s certainly not the limit > Xen has been run on, and Xen itself is set up to handle 5Tb. Which > I''m seeing to get exceeded on experimental(?) systems...Cool, I stand corrected.> And while I agree that failing at 1Tb is better than failing at 160Gb, > I favor fixing this once and completely over doing a little bit of > papering over the problem now just to require debugging the same > issue again later.Unfortunately it''s a bit late to be doing that for 4.1.0 :-(> >> If that''s the goal, then the patch as presented isn''t suitable, > >> as there''s not event a compat table set up for all of the > >> memory. > > > > paging_init seems to do the right thing and setup the compat M2P up to a > > maximum of RDWR_COMPAT_MPT_VIRT_END. > > With 1Gb being the theoretical limit of what a 32-bit guest can > see and access, that''s all a guest could ever sensibly ask for (a > [hypothetical] domain could ask for having a larger than the > default hole with more of the table mapped in).The size of a domain''s hypervisor hole and how much of the m2p it can map via XENMEM_machphys_mfn_list have no relationship though -- that''s the point of this change. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Mar-15 15:09 UTC
Re: [Xen-devel] Re: [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
>>> On 15.03.11 at 16:00, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: > On Mon, 2011-03-14 at 16:58 +0000, Jan Beulich wrote: >> >>> On 14.03.11 at 17:33, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: >> > paging_init seems to do the right thing and setup the compat M2P up to a >> > maximum of RDWR_COMPAT_MPT_VIRT_END. >> >> With 1Gb being the theoretical limit of what a 32-bit guest can >> see and access, that''s all a guest could ever sensibly ask for (a >> [hypothetical] domain could ask for having a larger than the >> default hole with more of the table mapped in). > > The size of a domain''s hypervisor hole and how much of the m2p it can > map via XENMEM_machphys_mfn_list have no relationship though -- that''s > the point of this change.I understand that (now) - I was just trying to explain why things were coded to setup the full range, but expose only what would be visible through the hole. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2011-Mar-15 15:14 UTC
Re: [Xen-devel] Re: [PATCH]: Allow tools to map arbitrarily large machphys_mfn_list on 32bit dom0
On Tue, 2011-03-15 at 15:09 +0000, Jan Beulich wrote:> >>> On 15.03.11 at 16:00, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: > > On Mon, 2011-03-14 at 16:58 +0000, Jan Beulich wrote: > >> >>> On 14.03.11 at 17:33, Ian Campbell <Ian.Campbell@eu.citrix.com> wrote: > >> > paging_init seems to do the right thing and setup the compat M2P up to a > >> > maximum of RDWR_COMPAT_MPT_VIRT_END. > >> > >> With 1Gb being the theoretical limit of what a 32-bit guest can > >> see and access, that''s all a guest could ever sensibly ask for (a > >> [hypothetical] domain could ask for having a larger than the > >> default hole with more of the table mapped in). > > > > The size of a domain''s hypervisor hole and how much of the m2p it can > > map via XENMEM_machphys_mfn_list have no relationship though -- that''s > > the point of this change. > > I understand that (now) - I was just trying to explain why things > were coded to setup the full range, but expose only what would > be visible through the hole.Ah, right, makes sense. Thanks, Ian, _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel