Dan, after tmem was turned on by default we''re getting reports of domain creation failures which go away as soon as tmem=0 gets specified. In particular we see this happen even when there are several Gb of freeable memory reported. As I understand it this is in particular with respect to multi page allocations not being handled in tmem code, and specifically shadow code tries to allocate a non negligible amount of order-2 pages (and iirc iommu code also continues to require - at least on large systems - higher order allocations). Unless I''m misunderstanding something, this is a design limitation that can only be overcome by eliminating all post-boot non-order-0 allocations that cannot fall back to order-0 ones, and hence defaulting tmem to on should be reconsidered. Besides that, in trying to reproduce this in some way I also get the impression that tmem''s memory consumption may significantly depend on the type of file system used - on my test box (using reiserfs) I cannot get tmem to consume any memory. Any explanation for this (I did verify that there are pools for each of the partitions)? Thanks, Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Dan Magenheimer <dan.magenheimer@oracle.com> 29.01.10 21:03 >>> >I don''t see how tmem could be causing this problem because it shouldn''t >even be in use (even with tmem=1) unless the domains in use call it.We have the tmem kernel side patch in place, so any domain using our kernel and a tmem-enabled file system would (potentially) use it.>And the tmem code that limits to single page allocation should IIRC >only be invoked once all system memory has been absorbed (which >I suppose could be the case for dom0 when dom0_mem is not specified?)Or if enough VMs had been created to consume all freely available (outside of tmem) memory. As said in the original mail - we have seen systems with more than 2G "freeable" memory still causing domain creation to fail with "not enough memory".>I *think* maybe non-pvops dom0 has it on, but pvops almost certainly >does not.Correct - and our kernel is non-pvops.>Rather than turn it off in Xen, maybe best solution is to turn it >off in non-pvops dom0, at least by default. Or really there''s >no reason it needs to be on for dom0 at all, so Xen code could >check for dom0 and disable it if dom0.I don''t know that much about tmem, but why would Dom0 not benefit from it being enabled if there''s memory available in Xen? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Sorry for the delay... I was away from the office.> >Rather than turn it off in Xen, maybe best solution is to turn it > >off in non-pvops dom0, at least by default. Or really there''s > >no reason it needs to be on for dom0 at all, so Xen code could > >check for dom0 and disable it if dom0. > > I don''t know that much about tmem, but why would Dom0 not benefit > from it being enabled if there''s memory available in Xen?Oops, yes, I misspoke. There are uses for tmem from dom0, especially when other guests are using "file:" virtual disks (i.e. *not* O_DIRECT) and those other guests are not tmem-aware.> >And the tmem code that limits to single page allocation should IIRC > >only be invoked once all system memory has been absorbed (which > >I suppose could be the case for dom0 when dom0_mem is not specified?) > > Or if enough VMs had been created to consume all freely available > (outside of tmem) memory. > > As said in the original mail - we have seen systems with more than 2G > "freeable" memory still causing domain creation to fail with > "not enough memory".I can guess at a few ways this could happen, but would probably need more info about the specific environment to be sure: A) There is some order>0 memory allocation in Xen domain creation that doesn''t fall back to order=0, that I''ve not seen in my testing but shows up in your systems (or has very recently been added). If true, this is a problem not only for tmem but also for all other memory optimization work and we need to identify and (if possible) fix it. B) The code in domain creation that frees freeable memory (see tools/python/xen/xend/balloon.py, search for "tmem") is computing a too-small value, maybe due to recent changes reserving "hidden extra" memory for domains). C) The above code is not getting executed at all during domain creation, perhaps because you have different/old domain creation tools that aren''t aware of tmem (or use some API that I didn''t know about and consequently didn''t test). Any thoughts? Thanks, Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Dan Magenheimer <dan.magenheimer@oracle.com> 04.02.10 22:01 >>> >A) There is some order>0 memory allocation in Xen domain creation >that doesn''t fall back to order=0, that I''ve not seen in my testing >but shows up in your systems (or has very recently been added). >If true, this is a problem not only for tmem but also for all other >memory optimization work and we need to identify and (if possible) >fix it.This is the case - x86''s shadow code does all its allocations as order-2 with no (possible the way things are designed) fallback. Also, the domain structure itself is of order 4 (38k), obviously without fallback (and even with address range restriction, though that one is affecting only really big machines, and it could be lifted to effectively not be a restriction anymore, i.e. just serve documentation purposes). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >>> Dan Magenheimer <dan.magenheimer@oracle.com> 04.02.10 22:01 >>> > >A) There is some order>0 memory allocation in Xen domain creation > >that doesn''t fall back to order=0, that I''ve not seen in my testing > >but shows up in your systems (or has very recently been added). > >If true, this is a problem not only for tmem but also for all other > >memory optimization work and we need to identify and (if possible) > >fix it. > > This is the case - x86''s shadow code does all its allocations as order- > 2 > with no (possible the way things are designed) fallback.Hmmm.... so this would affect HVM creation but not PVM, but also could cause PVM live migration to fail, correct?> Also, the domain structure itself is of order 4 (38k), obviously > without > fallback (and even with address range restriction, though that one > is affecting only really big machines, and it could be lifted to > effectively not be a restriction anymore, i.e. just serve documentation > purposes).This has likely been avoided by luck when lots of memory is flushed from tmem and returned to the Xen heap and consolidated. Are you suggesting that the domain structure could/should have two sizes, dynamically chosen by machine size? Or something else? In any case, I''d still suggest turning tmem off in your dom0 is the best short-term solution. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Dan Magenheimer <dan.magenheimer@oracle.com> 05.02.10 19:44 >>> >> >>> Dan Magenheimer <dan.magenheimer@oracle.com> 04.02.10 22:01 >>> >> >A) There is some order>0 memory allocation in Xen domain creation >> >that doesn''t fall back to order=0, that I''ve not seen in my testing >> >but shows up in your systems (or has very recently been added). >> >If true, this is a problem not only for tmem but also for all other >> >memory optimization work and we need to identify and (if possible) >> >fix it. >> >> This is the case - x86''s shadow code does all its allocations as order- >> 2 >> with no (possible the way things are designed) fallback. > >Hmmm.... so this would affect HVM creation but not PVM, but also >could cause PVM live migration to fail, correct?Yes.>> Also, the domain structure itself is of order 4 (38k), obviously >> without >> fallback (and even with address range restriction, though that one >> is affecting only really big machines, and it could be lifted to >> effectively not be a restriction anymore, i.e. just serve documentation >> purposes). > >This has likely been avoided by luck when lots of memory is >flushed from tmem and returned to the Xen heap and consolidated. > >Are you suggesting that the domain structure could/should have >two sizes, dynamically chosen by machine size? Or something >else?No, it just should be split into parts each of which fits in a page independent of architecture. But that''s nothing I would consider realistic for 4.0.>In any case, I''d still suggest turning tmem off in your dom0 >is the best short-term solution.I''m still not following you here: For one, I can''t recall a way to turn of tmem on a per-domain basis. Then I can''t see why it should be only our Dom0 to be affected. And finally I can''t see how the same couldn''t happen when only DomU-s use tmem. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> >This has likely been avoided by luck when lots of memory is > >flushed from tmem and returned to the Xen heap and consolidated. > > > >Are you suggesting that the domain structure could/should have > >two sizes, dynamically chosen by machine size? Or something > >else? > > No, it just should be split into parts each of which fits in a page > independent of architecture. But that''s nothing I would consider > realistic for 4.0.OK. Agreed this is too big a change for 4.0 but I''m thinking about post-4.0. The order=2 shadow page allocation should also probably be considered a "bug" for post-4.0 as, I think, even ballooning will eventually fragment memory and theoretically 75% of physical memory might be unused and domain creation (or PV migration) will fail. Since (I think) this affects other Xen 4.0 dynamic memory utilization solutions, I''ll post a separate basenote to discuss that.> >In any case, I''d still suggest turning tmem off in your dom0 > >is the best short-term solution. > > I''m still not following you here: For one, I can''t recall a way to turn > of tmem on a per-domain basis. Then I can''t see why it should be > only our Dom0 to be affected. And finally I can''t see how the same > couldn''t happen when only DomU-s use tmem.I''m suggesting disabling CONFIG_TMEM for default dom0 compile (for all dom0 for now). Then only environments that consciously run a domU with a tmem-enabled kernel could be affected. The failure can only occur if at least one domU/dom0 enables tmem, and even then should only occur in certain workloads, though I suppose eventually sufficient fragmentation may occur in any workload. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > of tmem on a per-domain basis. Then I can''t see why it should be > > only our Dom0 to be affected. And finally I can''t see how the same > > couldn''t happen when only DomU-s use tmem. > > I''m suggesting disabling CONFIG_TMEM for default dom0 compileOops, I see in re-reading your earlier posts that you are enabling it by default for domU as well. In that case I agree, sadly, that your best choice might be to disable tmem completely in your Xen hypervisor, at least until the Xen fragmentation issues are resolved.> -----Original Message----- > From: Dan Magenheimer > Sent: Monday, February 08, 2010 10:19 AM > To: Jan Beulich > Cc: Keir Fraser; xen-devel@lists.xensource.com > Subject: RE: tmem - really default to on? > > > >This has likely been avoided by luck when lots of memory is > > >flushed from tmem and returned to the Xen heap and consolidated. > > > > > >Are you suggesting that the domain structure could/should have > > >two sizes, dynamically chosen by machine size? Or something > > >else? > > > > No, it just should be split into parts each of which fits in a page > > independent of architecture. But that''s nothing I would consider > > realistic for 4.0. > > OK. Agreed this is too big a change for 4.0 but I''m thinking > about post-4.0. > > The order=2 shadow page allocation should also probably be > considered a "bug" for post-4.0 as, I think, even ballooning > will eventually fragment memory and theoretically 75% of > physical memory might be unused and domain creation (or PV > migration) will fail. > > Since (I think) this affects other Xen 4.0 dynamic memory > utilization solutions, I''ll post a separate basenote to > discuss that. > > > >In any case, I''d still suggest turning tmem off in your dom0 > > >is the best short-term solution. > > > > I''m still not following you here: For one, I can''t recall a way to > turn > > of tmem on a per-domain basis. Then I can''t see why it should be > > only our Dom0 to be affected. And finally I can''t see how the same > > couldn''t happen when only DomU-s use tmem. > > I''m suggesting disabling CONFIG_TMEM for default dom0 compile > (for all dom0 for now). Then only environments that consciously > run a domU with a tmem-enabled kernel could be affected. > The failure can only occur if at least one domU/dom0 enables > tmem, and even then should only occur in certain workloads, > though I suppose eventually sufficient fragmentation may > occur in any workload. > > Dan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Dan Magenheimer <dan.magenheimer@oracle.com> 08.02.10 18:18 >>> >> >In any case, I''d still suggest turning tmem off in your dom0 >> >is the best short-term solution. >> >> I''m still not following you here: For one, I can''t recall a way to turn >> of tmem on a per-domain basis. Then I can''t see why it should be >> only our Dom0 to be affected. And finally I can''t see how the same >> couldn''t happen when only DomU-s use tmem. > >I''m suggesting disabling CONFIG_TMEM for default dom0 compile >(for all dom0 for now). Then only environments that consciously >run a domU with a tmem-enabled kernel could be affected.That''s not a realistic option: Why would one ship separate Dom0 and DomU kernels, when one can do? At least we don''t, so compile-time disabling is out of question. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Dan Magenheimer <dan.magenheimer@oracle.com> 08.02.10 19:30 >>> >> > of tmem on a per-domain basis. Then I can''t see why it should be >> > only our Dom0 to be affected. And finally I can''t see how the same >> > couldn''t happen when only DomU-s use tmem. >> >> I''m suggesting disabling CONFIG_TMEM for default dom0 compile > >Oops, I see in re-reading your earlier posts that you >are enabling it by default for domU as well. In that >case I agree, sadly, that your best choice might be to >disable tmem completely in your Xen hypervisor, at least >until the Xen fragmentation issues are resolved.And why would that not hold for the upstream version? That''s what the mail thread was about - whether for 4.0 defaulting tmem to on should be reverted. I hence take your above statement as a ''yes'' to that question. Keir, will you do this then, or should I submit a revert patch? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/02/2010 09:25, "Jan Beulich" <JBeulich@novell.com> wrote:>> Oops, I see in re-reading your earlier posts that you >> are enabling it by default for domU as well. In that >> case I agree, sadly, that your best choice might be to >> disable tmem completely in your Xen hypervisor, at least >> until the Xen fragmentation issues are resolved. > > And why would that not hold for the upstream version? That''s what > the mail thread was about - whether for 4.0 defaulting tmem to on > should be reverted. I hence take your above statement as a ''yes'' > to that question. Keir, will you do this then, or should I submit a > revert patch?Yes, I can do it. It''s a shame though. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi, At 17:18 +0000 on 08 Feb (1265649531), Dan Magenheimer wrote:> The order=2 shadow page allocation should also probably be > considered a "bug" for post-4.0 as, I think, even ballooning > will eventually fragment memory and theoretically 75% of > physical memory might be unused and domain creation (or PV > migration) will fail.I think the correct approach to all of this is to move system-wide to allocating memory in 2MB contiguous aligned chunks. There''s no sense in doing guest allocations any finer-grained than that and there are noticeable performance wins from all the superpage support that''s gone in recently. Then little things like needing 16k contiguous areas just go away. If that''s not acceptable, I believe Christian Limpach looked at removing the requirement for shadow memory to be contiguous as part of his hosted-Xen project. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Tim Deegan <Tim.Deegan@citrix.com> 09.02.10 11:45 >>> >At 17:18 +0000 on 08 Feb (1265649531), Dan Magenheimer wrote: >> The order=2 shadow page allocation should also probably be >> considered a "bug" for post-4.0 as, I think, even ballooning >> will eventually fragment memory and theoretically 75% of >> physical memory might be unused and domain creation (or PV >> migration) will fail. > >I think the correct approach to all of this is to move system-wide to >allocating memory in 2MB contiguous aligned chunks. There''s no sense in >doing guest allocations any finer-grained than that and there are >noticeable performance wins from all the superpage support that''s gone >in recently. Then little things like needing 16k contiguous areas just >go away.I have to admit that I can''t see how this would work with ballooning, or (if the balloon driver was adjusted to deal with this) with fragmentation inside Dom0 (or any other guest that memory is intended to be removed from). Nor am I sure tmem could be changed to deal with 2Mb chunks instead of 4k ones. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 09/02/2010 13:00, "Jan Beulich" <JBeulich@novell.com> wrote:>> I think the correct approach to all of this is to move system-wide to >> allocating memory in 2MB contiguous aligned chunks. There''s no sense in >> doing guest allocations any finer-grained than that and there are >> noticeable performance wins from all the superpage support that''s gone >> in recently. Then little things like needing 16k contiguous areas just >> go away. > > I have to admit that I can''t see how this would work with ballooning, > or (if the balloon driver was adjusted to deal with this) with > fragmentation inside Dom0 (or any other guest that memory is > intended to be removed from). Nor am I sure tmem could be > changed to deal with 2Mb chunks instead of 4k ones.Balloon driver is the obvious fly in the ointment that I can see, too. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
At 13:31 +0000 on 09 Feb (1265722267), Keir Fraser wrote:> On 09/02/2010 13:00, "Jan Beulich" <JBeulich@novell.com> wrote: > > >> I think the correct approach to all of this is to move system-wide to > >> allocating memory in 2MB contiguous aligned chunks. There''s no sense in > >> doing guest allocations any finer-grained than that and there are > >> noticeable performance wins from all the superpage support that''s gone > >> in recently. Then little things like needing 16k contiguous areas just > >> go away. > > > > I have to admit that I can''t see how this would work with ballooning, > > or (if the balloon driver was adjusted to deal with this) with > > fragmentation inside Dom0 (or any other guest that memory is > > intended to be removed from). Nor am I sure tmem could be > > changed to deal with 2Mb chunks instead of 4k ones. > > Balloon driver is the obvious fly in the ointment that I can see, too.Good point. That''s going to be a problem for HVM ballooning, especially on EPT/NPT where having superpage allocations makes a big difference. In the meantime we can fix the shadow code. Unfortunately I won''t be able to look at it immediately but maybe Christian has a patch. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, XenServer Engineering Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Tuesday, February 09, 2010 2:30 AM > To: Jan Beulich; Dan Magenheimer > Cc: xen-devel@lists.xensource.com > Subject: Re: tmem - really default to on? > > On 09/02/2010 09:25, "Jan Beulich" <JBeulich@novell.com> wrote: > > >> Oops, I see in re-reading your earlier posts that you > >> are enabling it by default for domU as well. In that > >> case I agree, sadly, that your best choice might be to > >> disable tmem completely in your Xen hypervisor, at least > >> until the Xen fragmentation issues are resolved. > > > > And why would that not hold for the upstream version? That''s what > > the mail thread was about - whether for 4.0 defaulting tmem to on > > should be reverted. I hence take your above statement as a ''yes'' > > to that question. Keir, will you do this then, or should I submit a > > revert patch? > > Yes, I can do it. It''s a shame though. > > -- KeirYes, I agree as long as we are just flipping the default to off. Turning tmem on by default served its purpose of shaking out other problems, but unfortunately too late in the 4.0 release cycle to fix them all in time for the 4.0 release. Keir, please revert 20655/20758. Ideally, I''d like to see the default flipped back to ON in xen-unstable immediately after 4.0 is split off. (And, for the record, tmem is still IN the 4.0 release, it just needs to be explicitly enabled to use it and has some known bugs.) Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Further, I believe switching to 2Mb chunks just changes the problem from external fragmentation to internal fragmentation... or re-requires (for x86_64) keeping separate allocation pools for xenheap and domheap.> able to look at it immediately but maybe Christian has a patch./me looks hopefully in Christian''s direction...> -----Original Message----- > From: Tim Deegan [mailto:Tim.Deegan@citrix.com] > Sent: Tuesday, February 09, 2010 6:59 AM > To: Keir Fraser > Cc: Jan Beulich; Christian Limpach; xen-devel@lists.xensource.com; Dan > Magenheimer > Subject: Re: [Xen-devel] RE: tmem - really default to on? > > At 13:31 +0000 on 09 Feb (1265722267), Keir Fraser wrote: > > On 09/02/2010 13:00, "Jan Beulich" <JBeulich@novell.com> wrote: > > > > >> I think the correct approach to all of this is to move system-wide > to > > >> allocating memory in 2MB contiguous aligned chunks. There''s no > sense in > > >> doing guest allocations any finer-grained than that and there are > > >> noticeable performance wins from all the superpage support that''s > gone > > >> in recently. Then little things like needing 16k contiguous areas > just > > >> go away. > > > > > > I have to admit that I can''t see how this would work with > ballooning, > > > or (if the balloon driver was adjusted to deal with this) with > > > fragmentation inside Dom0 (or any other guest that memory is > > > intended to be removed from). Nor am I sure tmem could be > > > changed to deal with 2Mb chunks instead of 4k ones. > > > > Balloon driver is the obvious fly in the ointment that I can see, > too. > > Good point. That''s going to be a problem for HVM ballooning, > especially > on EPT/NPT where having superpage allocations makes a big difference. > > In the meantime we can fix the shadow code. Unfortunately I won''t be > able to look at it immediately but maybe Christian has a patch. > > Tim. > > -- > Tim Deegan <Tim.Deegan@citrix.com> > Principal Software Engineer, XenServer Engineering > Citrix Systems UK Ltd. (Company #02937203, SL9 0BG)_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel