I finally got around to implementing "ballooning up" in the pvops kernels. Now if you start a domain with "memory=X maxmem=Y", the domain will start with X MB of memory, but you can use "x[ml] mem-set" to expand the domain up to Y. This relies on the toolstack setting the E820 map for the domain with an E820_RAM region which goes beyond xen_start_info->nr_pages. When the domain starts up and sees this, then it adds the extra pages to the kernel''s E820 map, but marks them reserved. This causes the kernel to allocate page structures for that memory, but it doesn''t attempt to allocate or use them. When the balloon driver starts, it adds those pages to the list of ballooned out pages, and everything works as expected from there. This also means that you can fail to boot if Y is many times larger than X, because the kernel''s memory gets filled with page structures. This can be particularly acute on 32-bit domains, as the page structures must be in low memory. As a side-effect, it also works for dom0. If you set dom0_mem on the Xen command line, then nr_pages is limited to that value, but the kernel can still see the system''s real E820 map, and therefore adds all the system''s memory to its own balloon driver, potentially allowing dom0 to expand up to take all physical memory. However, this may caused bad side-effects if your system memory is much larger than your dom0_mem, especially if you use a 32-bit dom0. I may need to add a kernel command line option to limit the max initial balloon size to mitigate this... Also, any unused pages released at boot time (because they fall into holes between E820 regions) are also added to the balloon, so they can be ballooned back in again (this doesn''t happen automatically, however). (Konrad, the infrastructure put in place also makes it very easy for the kernel to punch a PCI hole in its own E820 maps to leave space for passthrough devices if we want. Or if the tools pass in an E820 map with a suitable hole, then it will be automatically honoured.) These changes are in xen/next-2.6.32 for the moment. I''ll merge them into xen/stable-2.6.32.x if they don''t cause too many problems. Also, there''s currently a bug in the xl toolset which causes it to ignore the maxmem domain parameter. Stefano has a pending patch to fix this. I haven''t tested this with xm/xend, XCP, Xen Client or Xen Server - please let me know how it goes. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 2010-09-07 at 09:36 +0100, Jeremy Fitzhardinge wrote:> I finally got around to implementing "ballooning up" in the pvops > kernels. Now if you start a domain with "memory=X maxmem=Y", the domain > will start with X MB of memory, but you can use "x[ml] mem-set" to > expand the domain up to Y.Cool. What did the issue with plymouth and friends turn out to be? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/07/2010 08:14 PM, Ian Campbell wrote:> On Tue, 2010-09-07 at 09:36 +0100, Jeremy Fitzhardinge wrote: >> I finally got around to implementing "ballooning up" in the pvops >> kernels. Now if you start a domain with "memory=X maxmem=Y", the domain >> will start with X MB of memory, but you can use "x[ml] mem-set" to >> expand the domain up to Y. > Cool. What did the issue with plymouth and friends turn out to be? >It was totalram_pages getting decremented when pages were being appended to the balloon, even though those pages were never counted. So it got very low, and while it isn''t actually used to account for how much free memory there is, some random pieces of code use something based on it to get a rough metric for free memory and block waiting for it to go up, or EAGAIN (or something). It was a bit hard to directly observe because totalram_pages doesn''t get displayed directly in /proc/meminfo, but removing the decrement showed that was the problem. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 2010-09-07 at 14:26 +0100, Jeremy Fitzhardinge wrote:> On 09/07/2010 08:14 PM, Ian Campbell wrote: > > On Tue, 2010-09-07 at 09:36 +0100, Jeremy Fitzhardinge wrote: > >> I finally got around to implementing "ballooning up" in the pvops > >> kernels. Now if you start a domain with "memory=X maxmem=Y", the domain > >> will start with X MB of memory, but you can use "x[ml] mem-set" to > >> expand the domain up to Y. > > Cool. What did the issue with plymouth and friends turn out to be? > > > > It was totalram_pages getting decremented when pages were being appended > to the balloon, even though those pages were never counted. So it got > very low, and while it isn''t actually used to account for how much free > memory there is, some random pieces of code use something based on it to > get a rough metric for free memory and block waiting for it to go up, or > EAGAIN (or something). > > It was a bit hard to directly observe because totalram_pages doesn''t get > displayed directly in /proc/meminfo, but removing the decrement showed > that was the problem.Subtle! Well spotted. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] > Cc: Dan Magenheimer; Daniel Kiper; Stefano Stabellini; Konrad Rzeszutek > Wilk > Subject: Ballooning up > > I finally got around to implementing "ballooning up" in the pvops > kernels. Now if you start a domain with "memory=X maxmem=Y", the > domain > will start with X MB of memory, but you can use "x[ml] mem-set" to > expand the domain up to Y.Nice!> As a side-effect, it also works for dom0. If you set dom0_mem on the > Xen command line, then nr_pages is limited to that value, but the > kernel > can still see the system''s real E820 map, and therefore adds all the > system''s memory to its own balloon driver, potentially allowing dom0 to > expand up to take all physical memory. > > However, this may caused bad side-effects if your system memory is much > larger than your dom0_mem, especially if you use a 32-bit dom0. I may > need to add a kernel command line option to limit the max initial > balloon size to mitigate this...I would call this dom0 functionality a bug. I think both Citrix and Oracle use dom0_mem as a normal boot option for every installation and, while I think both employ heuristics to choose a larger dom0_mem for larger physical memory, I don''t think it grows large enough for, say, >256GB physical memory, to accommodate the necessarily large number of page tables. So, I''d vote for NOT allowing dom0 to balloon up to physical memory if dom0_mem is specified, and possibly a kernel command line option that allows it to grow beyond. Or, possibly, no option and never allow dom0 memory to grow beyond dom0_mem unless (possibly) it grows with hot-plug. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Also, looking at the latest code in xen/next-2.6.32, I see you have removed the balloon lock. Isn''t this necessary to ensure multiple vcpus aren''t racing on adjusting the balloon size (and performing the hypercalls to do it)? IOW, are increase/decrease_reservation and the calls into the hypervisor thread-safe? And, related especially if the lock goes away (repeat of question asked here http://lists.xensource.com/archives/html/xen-devel/2010-08/msg01664.html ) wouldn''t it be better to use a separate workqueue rather than the kernel default queue, and is there any reason to queue the work on every cpu rather than just one? Thanks, Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/13/2010 02:17 PM, Dan Magenheimer wrote:>> As a side-effect, it also works for dom0. If you set dom0_mem on the >> Xen command line, then nr_pages is limited to that value, but the >> kernel >> can still see the system''s real E820 map, and therefore adds all the >> system''s memory to its own balloon driver, potentially allowing dom0 to >> expand up to take all physical memory. >> >> However, this may caused bad side-effects if your system memory is much >> larger than your dom0_mem, especially if you use a 32-bit dom0. I may >> need to add a kernel command line option to limit the max initial >> balloon size to mitigate this... > I would call this dom0 functionality a bug. I think both Citrix > and Oracle use dom0_mem as a normal boot option for every > installation and, while I think both employ heuristics to choose > a larger dom0_mem for larger physical memory, I don''t think it > grows large enough for, say, >256GB physical memory, to accommodate > the necessarily large number of page tables. > > So, I''d vote for NOT allowing dom0 to balloon up to physical > memory if dom0_mem is specified, and possibly a kernel command > line option that allows it to grow beyond. Or, possibly, no > option and never allow dom0 memory to grow beyond dom0_mem > unless (possibly) it grows with hot-plug.Yes, its a bit of a problem. The trouble is that the kernel can''t really distinguish the two cases; either way, it sees a Xen-supplied xen_start_info->nr_pages as the amount of initial memory available, and an E820 table referring to more RAM beyond that. I guess there are three options: 1. add a "xen_maxmem" (or something) kernel parameter to override space specified in the E820 table 2. ignore E820 if its a privileged domain 3. only allow extra memory up to a certain ratio of the base memory (8x? 16x? 32x?) I think the third is probably the simplest and least hacky, as it directly addresses the underlying issue (and prevents domU mishaps as well). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/13/2010 02:39 PM, Dan Magenheimer wrote:> Also, looking at the latest code in xen/next-2.6.32, I see > you have removed the balloon lock. Isn''t this necessary > to ensure multiple vcpus aren''t racing on adjusting the > balloon size (and performing the hypercalls to do it)? > IOW, are increase/decrease_reservation and the calls > into the hypervisor thread-safe?Yes, because they are all done within the same tasklet, so there''s no possibility of races.> And, related especially if the lock goes away (repeat > of question asked here > http://lists.xensource.com/archives/html/xen-devel/2010-08/msg01664.html ) > wouldn''t it be better to use a separate workqueue rather than > the kernel default queue,There''s a preference to use the default queue unless there''s a strong reason to do otherwise, to stop proliferation of kernel tasks. Is there a strong reason to use a specific balloon workqueue?> and is there any reason to > queue the work on every cpu rather than just one?There''s a keventd on every CPU, but work is queued on only one CPU at a time - it tends to end up running on the CPU which requested the work to be queued, but if it is already queued then it will be left as-is. I am seeing something queuing delayed work at 1kHz continiously, at least in dom0. Haven''t worked out what''s going on there... J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > I would call this dom0 functionality a bug. I think both Citrix > > and Oracle use dom0_mem as a normal boot option for every > > installation and, while I think both employ heuristics to choose > > a larger dom0_mem for larger physical memory, I don''t think it > > grows large enough for, say, >256GB physical memory, to accommodate > > the necessarily large number of page tables. > > > > So, I''d vote for NOT allowing dom0 to balloon up to physical > > memory if dom0_mem is specified, and possibly a kernel command > > line option that allows it to grow beyond. Or, possibly, no > > option and never allow dom0 memory to grow beyond dom0_mem > > unless (possibly) it grows with hot-plug. > > Yes, its a bit of a problem. The trouble is that the kernel can''t > really distinguish the two cases; either way, it sees a Xen-supplied > xen_start_info->nr_pages as the amount of initial memory available, and > an E820 table referring to more RAM beyond that. > > I guess there are three options: > > 1. add a "xen_maxmem" (or something) kernel parameter to override > space specified in the E820 table > 2. ignore E820 if its a privileged domain > 3. only allow extra memory up to a certain ratio of the base memory > (8x? 16x? 32x?) > > I think the third is probably the simplest and least hacky, as it > directly addresses the underlying issue (and prevents domU mishaps as > well).I like 2'': ignore E820 if it is dom0 and dom0_mem has been specified. This most closely conforms to current behavior in shipping systems and I don''t really see a use model for allowing dom0 memory to grow beyond dom0_mem (if dom0_mem is specified). (1) will most likely result in vendors specifying dom0_mem AND xen_maxmem to the same value, so IMHO will just be confusing (2) for non-dom0 privileged domains, I can''t offhand come up with a scenario where memory<>maxmem would be valuable, so this would be my second choice (after 2''). (3) seems like policy enforcement with insufficient information as the "correct" ratio might change in future kernels and we don''t even know what it should be now (and it may be very kernel dependent?) Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/13/2010 05:22 PM, Dan Magenheimer wrote:>>> I would call this dom0 functionality a bug. I think both Citrix >>> and Oracle use dom0_mem as a normal boot option for every >>> installation and, while I think both employ heuristics to choose >>> a larger dom0_mem for larger physical memory, I don''t think it >>> grows large enough for, say, >256GB physical memory, to accommodate >>> the necessarily large number of page tables. >>> >>> So, I''d vote for NOT allowing dom0 to balloon up to physical >>> memory if dom0_mem is specified, and possibly a kernel command >>> line option that allows it to grow beyond. Or, possibly, no >>> option and never allow dom0 memory to grow beyond dom0_mem >>> unless (possibly) it grows with hot-plug. >> Yes, its a bit of a problem. The trouble is that the kernel can''t >> really distinguish the two cases; either way, it sees a Xen-supplied >> xen_start_info->nr_pages as the amount of initial memory available, and >> an E820 table referring to more RAM beyond that. >> >> I guess there are three options: >> >> 1. add a "xen_maxmem" (or something) kernel parameter to override >> space specified in the E820 table >> 2. ignore E820 if its a privileged domain >> 3. only allow extra memory up to a certain ratio of the base memory >> (8x? 16x? 32x?) >> >> I think the third is probably the simplest and least hacky, as it >> directly addresses the underlying issue (and prevents domU mishaps as >> well). > I like 2'': ignore E820 if it is dom0 and dom0_mem has been specified. > This most closely conforms to current behavior in shipping systems > and I don''t really see a use model for allowing dom0 memory to > grow beyond dom0_mem (if dom0_mem is specified).The kernel never gets to see dom0_mem, since that''s a Xen parameter. Only doing the limit doesn''t help with domUs, potentially have the same problem.> (1) will most likely result in vendors specifying dom0_mem AND > xen_maxmem to the same value, so IMHO will just be confusingWho for? Dom0 won''t be able to balloon up, but if dom0 is being managed by a vendor software stack, that doesn''t matter much.> (2) for non-dom0 privileged domains, I can''t offhand come up with > a scenario where memory<>maxmem would be valuable, so this > would be my second choice (after 2'').What doe you mean by "memory<>maxmem"?> (3) seems like policy enforcement with insufficient information > as the "correct" ratio might change in future kernels and > we don''t even know what it should be now (and it may be > very kernel dependent?)Not really. The main structure that scales with memory size is the struct page array, which is about 64 bytes per page (so, putting an upper limit of ~64x if you don''t mind all kernel memory being filled with struct pages). For systems with highmem it must be in lowmem, so the scaling would be on base low memory, not all base memory. But, true, if you don''t intend to balloon up, there''s no point wasting memory on unused page structures. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, 2010-09-13 at 22:17 +0100, Dan Magenheimer wrote:> > From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] > > Cc: Dan Magenheimer; Daniel Kiper; Stefano Stabellini; Konrad Rzeszutek > > Wilk > > Subject: Ballooning up > > > > I finally got around to implementing "ballooning up" in the pvops > > kernels. Now if you start a domain with "memory=X maxmem=Y", the > > domain > > will start with X MB of memory, but you can use "x[ml] mem-set" to > > expand the domain up to Y. > > Nice! > > > As a side-effect, it also works for dom0. If you set dom0_mem on the > > Xen command line, then nr_pages is limited to that value, but the > > kernel > > can still see the system''s real E820 map, and therefore adds all the > > system''s memory to its own balloon driver, potentially allowing dom0 to > > expand up to take all physical memory. > > > > However, this may caused bad side-effects if your system memory is much > > larger than your dom0_mem, especially if you use a 32-bit dom0. I may > > need to add a kernel command line option to limit the max initial > > balloon size to mitigate this... > > I would call this dom0 functionality a bug. I think both Citrix > and Oracle use dom0_mem as a normal boot option for every > installation and, while I think both employ heuristics to choose > a larger dom0_mem for larger physical memory, I don''t think it > grows large enough for, say, >256GB physical memory, to accommodate > the necessarily large number of page tables.FWIW XenServer statically uses dom0_mem=752M and then balloons down on smaller systems where so much domain 0 memory is not required, the minimum is 128M or 256M or something. A 32on64 domain 0 kernel fails to boot if dom0_mem is more than around ~56G because it runs out of lowmem for the page array. I suspect that for some period before that the system isn''t terribly usable due to low amounts of available lowmem, even if it does manage to boot.> So, I''d vote for NOT allowing dom0 to balloon up to physical > memory if dom0_mem is specified, and possibly a kernel command > line option that allows it to grow beyond. Or, possibly, no > option and never allow dom0 memory to grow beyond dom0_mem > unless (possibly) it grows with hot-plug. > > Dan > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> On 14.09.10 at 00:51, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > I guess there are three options: > > 1. add a "xen_maxmem" (or something) kernel parameter to override > space specified in the E820 tableIsn''t that what the (native) "mem=" option is intended for?> 2. ignore E820 if its a privileged domainI think this upper limit specified by the machine E820 should be ignored here, and an option (as per 1.) should be available.> 3. only allow extra memory up to a certain ratio of the base memory > (8x? 16x? 32x?)Enforcing a sane upper limit of the ratio (we use 32x currently) seems like a reasonable thing to do in any case. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, 2010-09-13 at 23:51 +0100, Jeremy Fitzhardinge wrote:> On 09/13/2010 02:17 PM, Dan Magenheimer wrote: > >> As a side-effect, it also works for dom0. If you set dom0_mem on the > >> Xen command line, then nr_pages is limited to that value, but the > >> kernel > >> can still see the system''s real E820 map, and therefore adds all the > >> system''s memory to its own balloon driver, potentially allowing dom0 to > >> expand up to take all physical memory. > >> > >> However, this may caused bad side-effects if your system memory is much > >> larger than your dom0_mem, especially if you use a 32-bit dom0. I may > >> need to add a kernel command line option to limit the max initial > >> balloon size to mitigate this... > > I would call this dom0 functionality a bug. I think both Citrix > > and Oracle use dom0_mem as a normal boot option for every > > installation and, while I think both employ heuristics to choose > > a larger dom0_mem for larger physical memory, I don''t think it > > grows large enough for, say, >256GB physical memory, to accommodate > > the necessarily large number of page tables. > > > > So, I''d vote for NOT allowing dom0 to balloon up to physical > > memory if dom0_mem is specified, and possibly a kernel command > > line option that allows it to grow beyond. Or, possibly, no > > option and never allow dom0 memory to grow beyond dom0_mem > > unless (possibly) it grows with hot-plug. > > Yes, its a bit of a problem. The trouble is that the kernel can''t > really distinguish the two cases; either way, it sees a Xen-supplied > xen_start_info->nr_pages as the amount of initial memory available, and > an E820 table referring to more RAM beyond that. > > I guess there are three options: > > 1. add a "xen_maxmem" (or something) kernel parameter to override > space specified in the E820 table > 2. ignore E820 if its a privileged domainAs it stands I don''t think it is currently possible to boot any domain 0 kernel pre-ballooned other than by using the native mem= option. I think the Right Thing to do would be for privileged domains to combine the results of XENMEM_machine_memory_map (real e820) and XENMEM_memory_map (pseudo-physical "e820") by clamping the result of XENMEM_machine_memory_map at the maximum given in XENMEM_memory_map (or taking some sort of union). Then if we wanted to support pre-ballooning domain 0 via hypervisor only interfaces for some reason in the future then we would need to add a new option dom0_maxmem= or so which populated the result of XENMEM_memory_map with the appropriate size. I think this would be consistent with the behaviour for a non-privileged domain, dom0_mem= and dom0_maxmem= would correspond to memory= and maxmem= in a domU configuration file. However, although I think that the Right Thing, I don''t think having domain 0 cut off its e820 at nr_pages unless overridden by mem= would be a problem in practice and it certainly wins in terms of complexity of reconciling XENMEM_memory_map and XENMEM_machine_memory_map. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> true, if you don''t intend to balloon up, there''s no point wasting > memory on unused page structures.I think this is the key. If dom0_mem is NOT specified, dom0 launches with (essentially) all the physical memory of the machine, page tables are allocated in dom0 to map all of physical memory, and auto-ballooning is necessary to launch guests. If dom0_mem IS specified, it is often a much smaller number than size of physical memory; why waste ~1.5% of physical memory on page structures that will never be used? If someone wants to add an option to augment dom0_mem to allow memory-up-ballooning of dom0 above dom0_mem (and can justify a reason why some user might ever use this functionality), that''s fine. But let''s not change the definition of the dom0_mem option just because a bug fix happens to make it possible. My two cents, Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/14/2010 01:41 AM, Jan Beulich wrote:> >>> On 14.09.10 at 00:51, Jeremy Fitzhardinge <jeremy@goop.org> wrote: >> I guess there are three options: >> >> 1. add a "xen_maxmem" (or something) kernel parameter to override >> space specified in the E820 table > Isn''t that what the (native) "mem=" option is intended for?Good point, there''s no need to add anything.>> 3. only allow extra memory up to a certain ratio of the base memory >> (8x? 16x? 32x?) > Enforcing a sane upper limit of the ratio (we use 32x currently) > seems like a reasonable thing to do in any case.OK, I''ll do that. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/14/2010 02:07 AM, Ian Campbell wrote:> On Mon, 2010-09-13 at 23:51 +0100, Jeremy Fitzhardinge wrote: >> On 09/13/2010 02:17 PM, Dan Magenheimer wrote: >>>> As a side-effect, it also works for dom0. If you set dom0_mem on the >>>> Xen command line, then nr_pages is limited to that value, but the >>>> kernel >>>> can still see the system''s real E820 map, and therefore adds all the >>>> system''s memory to its own balloon driver, potentially allowing dom0 to >>>> expand up to take all physical memory. >>>> >>>> However, this may caused bad side-effects if your system memory is much >>>> larger than your dom0_mem, especially if you use a 32-bit dom0. I may >>>> need to add a kernel command line option to limit the max initial >>>> balloon size to mitigate this... >>> I would call this dom0 functionality a bug. I think both Citrix >>> and Oracle use dom0_mem as a normal boot option for every >>> installation and, while I think both employ heuristics to choose >>> a larger dom0_mem for larger physical memory, I don''t think it >>> grows large enough for, say, >256GB physical memory, to accommodate >>> the necessarily large number of page tables. >>> >>> So, I''d vote for NOT allowing dom0 to balloon up to physical >>> memory if dom0_mem is specified, and possibly a kernel command >>> line option that allows it to grow beyond. Or, possibly, no >>> option and never allow dom0 memory to grow beyond dom0_mem >>> unless (possibly) it grows with hot-plug. >> Yes, its a bit of a problem. The trouble is that the kernel can''t >> really distinguish the two cases; either way, it sees a Xen-supplied >> xen_start_info->nr_pages as the amount of initial memory available, and >> an E820 table referring to more RAM beyond that. >> >> I guess there are three options: >> >> 1. add a "xen_maxmem" (or something) kernel parameter to override >> space specified in the E820 table >> 2. ignore E820 if its a privileged domain > As it stands I don''t think it is currently possible to boot any domain 0 > kernel pre-ballooned other than by using the native mem= option. > > I think the Right Thing to do would be for privileged domains to combine > the results of XENMEM_machine_memory_map (real e820) and > XENMEM_memory_map (pseudo-physical "e820") by clamping the result of > XENMEM_machine_memory_map at the maximum given in XENMEM_memory_map (or > taking some sort of union).Does the dom0 domain builder bother to set a pseudo-phys E820?> However, although I think that the Right Thing, I don''t think having > domain 0 cut off its e820 at nr_pages unless overridden by mem= would be > a problem in practice and it certainly wins in terms of complexity of > reconciling XENMEM_memory_map and XENMEM_machine_memory_map.Indeed. I think adding general 32x limit between base and max size will prevent a completely unusable system, and then just suggest using memto control that more precisely (esp for dom0). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/14/2010 08:06 AM, Dan Magenheimer wrote:>> true, if you don''t intend to balloon up, there''s no point wasting >> memory on unused page structures. > I think this is the key. If dom0_mem is NOT specified, dom0 > launches with (essentially) all the physical memory of the > machine, page tables are allocated in dom0 to map all of physical > memory, and auto-ballooning is necessary to launch guests. > > If dom0_mem IS specified, it is often a much smaller number > than size of physical memory; why waste ~1.5% of physical memory > on page structures that will never be used? > > If someone wants to add an option to augment dom0_mem to allow > memory-up-ballooning of dom0 above dom0_mem (and can justify > a reason why some user might ever use this functionality), > that''s fine. But let''s not change the definition of the > dom0_mem option just because a bug fix happens to make it > possible.Technically (pedantically), the meaning of dom0_mem is unchanged - it sets the initial number of pages given to the domain, and is functionally identical to the normal "memory" parameter in a domU config file. The difference is that we''re now paying attention to the E820 map, which is set by maxmem= in domU, but is the hardware/BIOS one in dom0. I''m not sure what I''m doing that''s different to the xenolinux kernels; I guess they hack up the whole memory init path more aggressively. But the pvops behaviour is more or less the straightforward outcome of looking at the Xen-provided E820 and reserving the gaps between the actual page count and the memory described therein. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 2010-09-14 at 17:42 +0100, Jeremy Fitzhardinge wrote:> On 09/14/2010 02:07 AM, Ian Campbell wrote: > > On Mon, 2010-09-13 at 23:51 +0100, Jeremy Fitzhardinge wrote: > >> On 09/13/2010 02:17 PM, Dan Magenheimer wrote: > >>>> As a side-effect, it also works for dom0. If you set dom0_mem on the > >>>> Xen command line, then nr_pages is limited to that value, but the > >>>> kernel > >>>> can still see the system''s real E820 map, and therefore adds all the > >>>> system''s memory to its own balloon driver, potentially allowing dom0 to > >>>> expand up to take all physical memory. > >>>> > >>>> However, this may caused bad side-effects if your system memory is much > >>>> larger than your dom0_mem, especially if you use a 32-bit dom0. I may > >>>> need to add a kernel command line option to limit the max initial > >>>> balloon size to mitigate this... > >>> I would call this dom0 functionality a bug. I think both Citrix > >>> and Oracle use dom0_mem as a normal boot option for every > >>> installation and, while I think both employ heuristics to choose > >>> a larger dom0_mem for larger physical memory, I don''t think it > >>> grows large enough for, say, >256GB physical memory, to accommodate > >>> the necessarily large number of page tables. > >>> > >>> So, I''d vote for NOT allowing dom0 to balloon up to physical > >>> memory if dom0_mem is specified, and possibly a kernel command > >>> line option that allows it to grow beyond. Or, possibly, no > >>> option and never allow dom0 memory to grow beyond dom0_mem > >>> unless (possibly) it grows with hot-plug. > >> Yes, its a bit of a problem. The trouble is that the kernel can''t > >> really distinguish the two cases; either way, it sees a Xen-supplied > >> xen_start_info->nr_pages as the amount of initial memory available, and > >> an E820 table referring to more RAM beyond that. > >> > >> I guess there are three options: > >> > >> 1. add a "xen_maxmem" (or something) kernel parameter to override > >> space specified in the E820 table > >> 2. ignore E820 if its a privileged domain > > As it stands I don''t think it is currently possible to boot any domain 0 > > kernel pre-ballooned other than by using the native mem= option. > > > > I think the Right Thing to do would be for privileged domains to combine > > the results of XENMEM_machine_memory_map (real e820) and > > XENMEM_memory_map (pseudo-physical "e820") by clamping the result of > > XENMEM_machine_memory_map at the maximum given in XENMEM_memory_map (or > > taking some sort of union). > > Does the dom0 domain builder bother to set a pseudo-phys E820?I thought the default with XENMEM_memory_map was to construct a fake 0..startinfo->nr_pages size e820, which would have been sensible, but it turns out that''s not what happens. In fact XENMEM_memory_map will return ENOSYS in that case and guests are expected to construct the fake e820 themselves.> > However, although I think that the Right Thing, I don''t think having > > domain 0 cut off its e820 at nr_pages unless overridden by mem= would be > > a problem in practice and it certainly wins in terms of complexity of > > reconciling XENMEM_memory_map and XENMEM_machine_memory_map. > > Indeed. I think adding general 32x limit between base and max size will > prevent a completely unusable system, and then just suggest using mem> to control that more precisely (esp for dom0).Sounds reasonable. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, 2010-09-14 at 23:05 +0100, Jeremy Fitzhardinge wrote:> On 09/14/2010 08:06 AM, Dan Magenheimer wrote: > >> true, if you don''t intend to balloon up, there''s no point wasting > >> memory on unused page structures. > > I think this is the key. If dom0_mem is NOT specified, dom0 > > launches with (essentially) all the physical memory of the > > machine, page tables are allocated in dom0 to map all of physical > > memory, and auto-ballooning is necessary to launch guests. > > > > If dom0_mem IS specified, it is often a much smaller number > > than size of physical memory; why waste ~1.5% of physical memory > > on page structures that will never be used? > > > > If someone wants to add an option to augment dom0_mem to allow > > memory-up-ballooning of dom0 above dom0_mem (and can justify > > a reason why some user might ever use this functionality), > > that''s fine. But let''s not change the definition of the > > dom0_mem option just because a bug fix happens to make it > > possible. > > Technically (pedantically), the meaning of dom0_mem is unchanged - it > sets the initial number of pages given to the domain, and is > functionally identical to the normal "memory" parameter in a domU config > file. The difference is that we''re now paying attention to the E820 > map, which is set by maxmem= in domU, but is the hardware/BIOS one in dom0. > > I''m not sure what I''m doing that''s different to the xenolinux kernels; I > guess they hack up the whole memory init path more aggressively. But > the pvops behaviour is more or less the straightforward outcome of > looking at the Xen-provided E820 and reserving the gaps between the > actual page count and the memory described therein.xenolinux treats the XENMEM_memory_map and XENMEM_machine_memory_map as separate things in some wierd split-brain understanding of the physical address space. Try looking in /proc/iomem on a xenolinux kernel -- IIRC it has a mish-mash of both address spaces in it... What PVops does is far more sane. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/15/2010 12:10 AM, Ian Campbell wrote:>> Indeed. I think adding general 32x limit between base and max size will >> prevent a completely unusable system, and then just suggest using mem>> to control that more precisely (esp for dom0). > Sounds reasonable.I found 32x doesn''t work; there seems to be a lot more per-page overhead than I expected. I made the limit 10x, which I determined empirically and somewhat arbitrarily, but it does seem reasonable. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] > Sent: Wednesday, September 15, 2010 11:29 AM > To: Ian Campbell > Cc: Dan Magenheimer; Stefano Stabellini; Xen-devel@lists.xensource.com; > Daniel Kiper; Konrad Wilk > Subject: Re: [Xen-devel] RE: Ballooning up > > On 09/15/2010 12:10 AM, Ian Campbell wrote: > >> Indeed. I think adding general 32x limit between base and max size > will > >> prevent a completely unusable system, and then just suggest using > mem> >> to control that more precisely (esp for dom0). > > Sounds reasonable. > > I found 32x doesn''t work; there seems to be a lot more per-page > overhead > than I expected. I made the limit 10x, which I determined empirically > and somewhat arbitrarily, but it does seem reasonable.Any idea what amount/percent of memory is "wasted" with this limit? (e.g. assuming a system with 10GB physical memory and dom0_mem=1G and no up-ballooning) So if one knows a priori that dom0 will not be ballooned up above dom0_mem, one specifies dom0_mem= on the xen boot line and mem= on the dom0 "module" line? IIRC the Linux mem=1G option doesn''t really limit physical memory to 1G, just specifies the highest legal address, ignoring holes. Dunno if dom0_mem has this problem (on xenolinux) but I think it does not. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/15/2010 11:06 AM, Dan Magenheimer wrote:>> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] >> Sent: Wednesday, September 15, 2010 11:29 AM >> To: Ian Campbell >> Cc: Dan Magenheimer; Stefano Stabellini; Xen-devel@lists.xensource.com; >> Daniel Kiper; Konrad Wilk >> Subject: Re: [Xen-devel] RE: Ballooning up >> >> On 09/15/2010 12:10 AM, Ian Campbell wrote: >>>> Indeed. I think adding general 32x limit between base and max size >> will >>>> prevent a completely unusable system, and then just suggest using >> mem>>>> to control that more precisely (esp for dom0). >>> Sounds reasonable. >> I found 32x doesn''t work; there seems to be a lot more per-page >> overhead >> than I expected. I made the limit 10x, which I determined empirically >> and somewhat arbitrarily, but it does seem reasonable. > Any idea what amount/percent of memory is "wasted" with this limit? > (e.g. assuming a system with 10GB physical memory and dom0_mem=1G > and no up-ballooning)Not sure precisely. It probably depends quite a bit on your kernel config (32 vs 64 bit, the various sparsemem options, among other things).> So if one knows a priori that dom0 will not be ballooned up > above dom0_mem, one specifies dom0_mem= on the xen boot line > and mem= on the dom0 "module" line?Yes, but if dom0_mem is more than about 3G its probably worth setting mem to dom0_mem+1G.> IIRC the Linux mem=1G option doesn''t really limit physical > memory to 1G, just specifies the highest legal address, ignoring > holes. Dunno if dom0_mem has this problem (on xenolinux) > but I think it does not. >It does effectively, because the kernel will free any Xen-provided memory which lies in the PCI hole (and any other holes in the E820 map). It will then add that freed memory to the extra memory space so you can balloon it back in again - but if you use "mem=" with the same limit as the dom0_mem then it will truncate that region so you can''t use it (and those pages are lost to dom0, but usable by other domains). The xenolinux kernel treated the machine physical memory address space and the pseudophysical address space as being two completely distinct ones; as a result it was quite happy to have RAM at the same address as PCI devices. This gets very messy in modern (-ish, ie, post about 2.6.20 or so) kernels because they maintain a unified resource tree which tracks device mappings and RAM in the same structure. To avoid this mess, the pvops dom0 kernels treat them as being in the same address space, so it punches holes out of the RAM where devices want to live. However, the Xen domain builder still maps memory from 0-dom0_mem linearly, and the kernel needs to free up any memory which overlaps with devices (=E820 holes). This means that its generally impossible to give dom0 between 3G-4G of memory, as the memory in the holes is always freed. If you want to give the domain 4G of accessible RAM then you need to set dom0_mem to 4G, mem=5G and balloon in all freed pages. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
(rolling back to the original pre-drift topic)> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] > Subject: Re: [Xen-devel] Ballooning up > > On 09/07/2010 08:14 PM, Ian Campbell wrote: > > On Tue, 2010-09-07 at 09:36 +0100, Jeremy Fitzhardinge wrote: > >> I finally got around to implementing "ballooning up" in the pvops > >> kernels. Now if you start a domain with "memory=X maxmem=Y", the > domain > >> will start with X MB of memory, but you can use "x[ml] mem-set" to > >> expand the domain up to Y. > > Cool. What did the issue with plymouth and friends turn out to be? > > > > It was totalram_pages getting decremented when pages were being > appended > to the balloon, even though those pages were never counted. So it got > very low, and while it isn''t actually used to account for how much free > memory there is, some random pieces of code use something based on it > to > get a rough metric for free memory and block waiting for it to go up, > or > EAGAIN (or something). > > It was a bit hard to directly observe because totalram_pages doesn''t > get > displayed directly in /proc/meminfo, but removing the decrement showed > that was the problem.I went to try this out and it appears to me that the patch that implements this is built on top of a fairly significant sequence of E820-ish patches, none of which is upstream? True? Or is my rudimentary git knowledge misleading me? This is important because the maxmem= functionality is primarily of use in domU and it appears to be present in 2.6.18-based PV kernels, but is not present in 2.6.32 (or later) pvops kernels, so will appear to be a functionality regression. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/15/2010 02:47 PM, Dan Magenheimer wrote:> (rolling back to the original pre-drift topic) > >> From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] >> Subject: Re: [Xen-devel] Ballooning up >> >> On 09/07/2010 08:14 PM, Ian Campbell wrote: >>> On Tue, 2010-09-07 at 09:36 +0100, Jeremy Fitzhardinge wrote: >>>> I finally got around to implementing "ballooning up" in the pvops >>>> kernels. Now if you start a domain with "memory=X maxmem=Y", the >> domain >>>> will start with X MB of memory, but you can use "x[ml] mem-set" to >>>> expand the domain up to Y. >>> Cool. What did the issue with plymouth and friends turn out to be? >>> >> It was totalram_pages getting decremented when pages were being >> appended >> to the balloon, even though those pages were never counted. So it got >> very low, and while it isn''t actually used to account for how much free >> memory there is, some random pieces of code use something based on it >> to >> get a rough metric for free memory and block waiting for it to go up, >> or >> EAGAIN (or something). >> >> It was a bit hard to directly observe because totalram_pages doesn''t >> get >> displayed directly in /proc/meminfo, but removing the decrement showed >> that was the problem. > I went to try this out and it appears to me that the patch > that implements this is built on top of a fairly significant > sequence of E820-ish patches, none of which is upstream? True? > Or is my rudimentary git knowledge misleading me? > > This is important because the maxmem= functionality is primarily of > use in domU and it appears to be present in 2.6.18-based PV > kernels, but is not present in 2.6.32 (or later) pvops kernels, > so will appear to be a functionality regression.There are a number of pre-req patches to make it all work. I''m in the process of putting together a branch for upstreaming them. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel