The solution I am working on for how to support Linux hugepages (Xen superpages) involves creating domains made up entirely of superpages. I can create a working domain with superpages and am in the process of supporting it in save/restore. For this to work properly this should be an attribute of a domain, specified somewhere in domain configuration and attached to that domain for its lifetime. This way it could be checked at memory populate time, save/restore time, and by the various balloon drivers. My question is for those of you who best know the overall Xen design principles. Where should the flag be specified by the user? Where should it best be set in the running domain? I''ve seen some examples of flags being passed around, but would like some guidance on the best place to put it to fit into the Xen design. Thanks, Dave McCracken Oracle Corp. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-02 13:58 UTC
Re: [Xen-devel] Design question for PV superpage support
On 02/03/2009 13:54, "Dave McCracken" <dcm@mccr.org> wrote:> My question is for those of you who best know the overall Xen design > principles. Where should the flag be specified by the user? Where should it > best be set in the running domain? I''ve seen some examples of flags being > passed around, but would like some guidance on the best place to put it to > fit into the Xen design.Specify in domain config file, and also stick it in xenstore somewhere. From there it should be possible to get it picked up and packed into the save/restore file pretty much automatically I think, and also you can make it accessible there by balloon drivers. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mick Jordan
2009-Mar-02 16:43 UTC
Re: [Xen-devel] Design question for PV superpage support
On 03/02/09 05:54, Dave McCracken wrote:> The solution I am working on for how to support Linux hugepages (Xen > superpages) involves creating domains made up entirely of superpages. I can > create a working domain with superpages and am in the process of supporting > it in save/restore. > >This wouldn''t work too well for me in the case of thread stacks because we need to map out parts of the stack and, although we want large virtual stacks, we don''t want do dedicate that much physical memory. Is it really difficult to support mixed pages sizes in the general case, e.g., save/restore etc.? Mick _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-02 17:06 UTC
Re: [Xen-devel] Design question for PV superpage support
On 02/03/2009 16:43, "Mick Jordan" <Mick.Jordan@sun.com> wrote:> On 03/02/09 05:54, Dave McCracken wrote: >> The solution I am working on for how to support Linux hugepages (Xen >> superpages) involves creating domains made up entirely of superpages. I can >> create a working domain with superpages and am in the process of supporting >> it in save/restore. >> > This wouldn''t work too well for me in the case of thread stacks because > we need to map out parts of the stack and, although we want large > virtual stacks, we don''t want do dedicate that much physical memory. Is > it really difficult to support mixed pages sizes in the general case, > e.g., save/restore etc.?You can still make 4kB mappings of subsections of 2MB physical extents. And the guest kernel will still be able to allocate subsections of 2MB physical extents for various uses. Isn''t that all you need for e.g., this thread stack situation? Presumably Dave McCracken will be implementing a ''best effort'' mode for domains where we try to allocate superpages but we get by at reduced performance if we have to allocate some discontiguous extents due to lack of contiguous available memory. That would be reasonably sensible. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dave McCracken
2009-Mar-02 17:29 UTC
Re: [Xen-devel] Design question for PV superpage support
On Monday 02 March 2009, Mick Jordan wrote:> > The solution I am working on for how to support Linux hugepages (Xen > > superpages) involves creating domains made up entirely of superpages. I > > can create a working domain with superpages and am in the process of > > supporting it in save/restore. > > This wouldn''t work too well for me in the case of thread stacks because > we need to map out parts of the stack and, although we want large > virtual stacks, we don''t want do dedicate that much physical memory. Is > it really difficult to support mixed pages sizes in the general case, > e.g., save/restore etc.?What I am doing is populating the domain with 2M pages. The hypervisor fills in all its internal arrays as if they were regular 4K pages. The guest is then free to use mixed size pages. The only significant difference is that when a guest does allocate a 2M page, it''s guaranteed to be properly aligned at the machine page level so it can be mapped as a hugepage. All 4K page allocations will continue to work. Dave McCracken Oracle Corp. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mick Jordan
2009-Mar-02 17:45 UTC
Re: [Xen-devel] Design question for PV superpage support
On 03/02/09 08:43, Mick Jordan wrote:> On 03/02/09 05:54, Dave McCracken wrote: >> The solution I am working on for how to support Linux hugepages (Xen >> superpages) involves creating domains made up entirely of >> superpages. I can create a working domain with superpages and am in >> the process of supporting it in save/restore. >> >>I''m assuming that this means that everything is upgraded from 4K to 2MB. E.g. pfn 0 = 0, pfn 1 = 2MB., etc., and the mfn<->pfn maps also.> This wouldn''t work too well for me in the case of thread stacks > because we need to map out parts of the stack and, although we want > large virtual stacks, we don''t want do dedicate that much physical > memory. Is it really difficult to support mixed pages sizes in the > general case, e.g., save/restore etc.?Save/restore is definitely important for me and we do support it at present. I''m wondering if I might be able to "reapply" my 2MB mappings after a restore on a 4K system, given that these are just layered on a 1-1 mapping between physical/virtual for all allocated memory. Mick _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-02 17:52 UTC
Re: [Xen-devel] Design question for PV superpage support
On 02/03/2009 17:29, "Dave McCracken" <dcm@mccr.org> wrote:>> This wouldn''t work too well for me in the case of thread stacks because >> we need to map out parts of the stack and, although we want large >> virtual stacks, we don''t want do dedicate that much physical memory. Is >> it really difficult to support mixed pages sizes in the general case, >> e.g., save/restore etc.? > > What I am doing is populating the domain with 2M pages. The hypervisor fills > in all its internal arrays as if they were regular 4K pages. The guest is > then free to use mixed size pages. The only significant difference is that > when a guest does allocate a 2M page, it''s guaranteed to be properly aligned > at the machine page level so it can be mapped as a hugepage. All 4K page > allocations will continue to work.It''d be nice to fall back to the case of not being able to guarantee all 2MB extents are aligned and contiguous. So for example being able to migrate to or restore on a system that currently doesn''t have enough contiguous memory. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-02 17:54 UTC
Re: [Xen-devel] Design question for PV superpage support
On 02/03/2009 17:45, "Mick Jordan" <Mick.Jordan@sun.com> wrote:>>> The solution I am working on for how to support Linux hugepages (Xen >>> superpages) involves creating domains made up entirely of >>> superpages. I can create a working domain with superpages and am in >>> the process of supporting it in save/restore. > I''m assuming that this means that everything is upgraded from 4K to 2MB. > E.g. pfn 0 = 0, pfn 1 = 2MB., etc., and the mfn<->pfn maps also.No, it doesn''t mean that, which will be clear from Dave''s response just now. K.>> This wouldn''t work too well for me in the case of thread stacks >> because we need to map out parts of the stack and, although we want >> large virtual stacks, we don''t want do dedicate that much physical >> memory. Is it really difficult to support mixed pages sizes in the >> general case, e.g., save/restore etc.? > Save/restore is definitely important for me and we do support it at > present. I''m wondering if I might be able to "reapply" my 2MB mappings > after a restore on a 4K system, given that these are just layered on a > 1-1 mapping between physical/virtual for all allocated memory._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dave McCracken
2009-Mar-02 18:00 UTC
Re: [Xen-devel] Design question for PV superpage support
On Monday 02 March 2009, Mick Jordan wrote:> On 03/02/09 08:43, Mick Jordan wrote: > > On 03/02/09 05:54, Dave McCracken wrote: > >> The solution I am working on for how to support Linux hugepages (Xen > >> superpages) involves creating domains made up entirely of > >> superpages. I can create a working domain with superpages and am in > >> the process of supporting it in save/restore. > > I''m assuming that this means that everything is upgraded from 4K to 2MB. > E.g. pfn 0 = 0, pfn 1 = 2MB., etc., and the mfn<->pfn maps also.No, actually, it doesn''t do that. The hypervisor allocates 2M pages, then expands them into 4K pages for the mfn<->pfn maps, etc. The only effective difference is that any given 2M-aligned range of pfns is guaranteed to map to a contiguous 2M-aligned range of mfns. Therefore the guest can safely allocate 2M pages. Dave McCracken Oracle Corp. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mick Jordan
2009-Mar-02 18:02 UTC
Re: [Xen-devel] Design question for PV superpage support
On 03/02/09 09:06, Keir Fraser wrote:> On 02/03/2009 16:43, "Mick Jordan" <Mick.Jordan@sun.com> wrote: > > >> On 03/02/09 05:54, Dave McCracken wrote: >> >>> The solution I am working on for how to support Linux hugepages (Xen >>> superpages) involves creating domains made up entirely of superpages. I can >>> create a working domain with superpages and am in the process of supporting >>> it in save/restore. >>> >>> >> This wouldn''t work too well for me in the case of thread stacks because >> we need to map out parts of the stack and, although we want large >> virtual stacks, we don''t want do dedicate that much physical memory. Is >> it really difficult to support mixed pages sizes in the general case, >> e.g., save/restore etc.? >> > > You can still make 4kB mappings of subsections of 2MB physical extents. And > the guest kernel will still be able to allocate subsections of 2MB physical > extents for various uses. Isn''t that all you need for e.g., this thread > stack situation? > >Yes, that would work. Assuming it doesn''t cause problems in other ways, e.g. save/restore, given that this re-introduces mixed mappings. I''d appreciate someone explaining the problems for save/restore with the earlier patch that simply allowed 2MB pages in PTEs.> Presumably Dave McCracken will be implementing a ''best effort'' mode for > domains where we try to allocate superpages but we get by at reduced > performance if we have to allocate some discontiguous extents due to lack of > contiguous available memory. That would be reasonably sensible. > >Indeed. Mick _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Mar-02 18:03 UTC
RE: [Xen-devel] Design question for PV superpage support
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > > It''d be nice to fall back to the case of not being able to > guarantee all 2MB > extents are aligned and contiguous. So for example being able > to migrate to > or restore on a system that currently doesn''t have enough > contiguous memory.Well, yes and no. I believe the ONLY reason to use 2MB pages is to achieve a significant performance advantage. And I suspect emulating 2MB "virtual pages" on 4KB physical pages will perform at least slightly worse than just 4KB-on-4KB, true? _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mick Jordan
2009-Mar-02 18:14 UTC
Re: [Xen-devel] Design question for PV superpage support
On 03/02/09 10:00, Dave McCracken wrote:> No, actually, it doesn''t do that. The hypervisor allocates 2M pages, then > expands them into 4K pages for the mfn<->pfn maps, etc. > > The only effective difference is that any given 2M-aligned range of pfns is > guaranteed to map to a contiguous 2M-aligned range of mfns. Therefore the > guest can safely allocate 2M pages. >Ok. So I want to re-iterate my question from a previous post. After the patch allowing mixed mappings, what exactly went wrong on save/restore. And would my special case of 1-1 physival/virtual mappings with additional 2MB VM mappings adddress after domain start suffer in that case? From my (brief) experience, I think the problems of finding enough contiguous machine memory to allocate an all 2MB domain might be prohibitive. And when the memory is not fragmented I did not find it hard to "find" contiguous aligned 2MB machine pages even with the usual (seemingly random) pfn -> mfn mappings. It''s a bit more code and runtime overhead, but it doesn''t happen enough to worry about that. Mick _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-02 18:30 UTC
Re: [Xen-devel] Design question for PV superpage support
On 02/03/2009 18:03, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> >> It''d be nice to fall back to the case of not being able to >> guarantee all 2MB >> extents are aligned and contiguous. So for example being able >> to migrate to >> or restore on a system that currently doesn''t have enough >> contiguous memory. > > Well, yes and no. I believe the ONLY reason to use 2MB > pages is to achieve a significant performance advantage. > And I suspect emulating 2MB "virtual pages" on 4KB physical > pages will perform at least slightly worse than just > 4KB-on-4KB, true?If you make this constraint then you risk creating domains that you cannot always conveniently restore. Obviously you would allocate 2MB extents wherever possible, since that is the whole point of this drawn out exercise. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mick Jordan
2009-Mar-02 18:46 UTC
Re: [Xen-devel] Design question for PV superpage support
On 03/02/09 10:30, Keir Fraser wrote:> On 02/03/2009 18:03, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote: > > >>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >>> >>> It''d be nice to fall back to the case of not being able to >>> guarantee all 2MB >>> extents are aligned and contiguous. So for example being able >>> to migrate to >>> or restore on a system that currently doesn''t have enough >>> contiguous memory. >>> >> Well, yes and no. I believe the ONLY reason to use 2MB >> pages is to achieve a significant performance advantage. >> And I suspect emulating 2MB "virtual pages" on 4KB physical >> pages will perform at least slightly worse than just >> 4KB-on-4KB, true? >> > > If you make this constraint then you risk creating domains that you cannot > always conveniently restore. Obviously you would allocate 2MB extents > wherever possible, since that is the whole point of this drawn out exercise. >Indeed, performance is the issue, less TLB misses. I''m happy to use 2MB pages when I can and fall back on 4K when I can''t. I just want Xen not to fall over and save/restore to work. Mick _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Mar-02 18:48 UTC
RE: [Xen-devel] Design question for PV superpage support
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > > On 02/03/2009 18:03, "Dan Magenheimer" > <dan.magenheimer@oracle.com> wrote: > > >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > >> > >> It''d be nice to fall back to the case of not being able to > >> guarantee all 2MB > >> extents are aligned and contiguous. So for example being able > >> to migrate to > >> or restore on a system that currently doesn''t have enough > >> contiguous memory. > > > > Well, yes and no. I believe the ONLY reason to use 2MB > > pages is to achieve a significant performance advantage. > > And I suspect emulating 2MB "virtual pages" on 4KB physical > > pages will perform at least slightly worse than just > > 4KB-on-4KB, true? > > If you make this constraint then you risk creating domains > that you cannot > always conveniently restore. Obviously you would allocate 2MB extents > wherever possible, since that is the whole point of this > drawn out exercise.Understood. This is a case where convenience and the primary objective conflict. I can''t think offhand of a way to do it, but restoring or migrating a 2MB-assumed domain into an environment where the vast majority of 2MB pages are emulated should probably raise a bright red flag somehow. Or there needs to be some tool that can at least be queried as to how many 2MB pages are being emulated. But probably the right long-term answer is a 2MB Xen with a 2MB Linux when applications assume/prefer 2MB pages. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-02 19:04 UTC
Re: [Xen-devel] Design question for PV superpage support
On 02/03/2009 18:48, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:>> If you make this constraint then you risk creating domains >> that you cannot >> always conveniently restore. Obviously you would allocate 2MB extents >> wherever possible, since that is the whole point of this >> drawn out exercise. > > Understood. This is a case where convenience and the primary objective > conflict. I can''t think offhand of a way to do it, but restoring > or migrating a 2MB-assumed domain into an environment where the > vast majority of 2MB pages are emulated should probably raise a > bright red flag somehow. Or there needs to be some tool that > can at least be queried as to how many 2MB pages are being emulated. > > But probably the right long-term answer is a 2MB Xen with a 2MB > Linux when applications assume/prefer 2MB pages.I''d certainly be okay with this new config option meaning ''must get 2MB extents'' for now. It can be improved if the apparent downsides bite in practice. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dave McCracken
2009-Mar-02 19:14 UTC
Re: [Xen-devel] Design question for PV superpage support
On Monday 02 March 2009, Mick Jordan wrote:> Ok. So I want to re-iterate my question from a previous post. After the > patch allowing mixed mappings, what exactly went wrong on save/restore. > And would my special case of 1-1 physival/virtual mappings with > additional 2MB VM mappings adddress after domain start suffer in that case?My understanding of save/restore is that it will save your carefully selected 2M pages, cheerfully restore them onto a random set of mfns, then expect your guest to continue running. I haven''t studied it enough to know whether your guest at least gets a chance to intervene and fix things after the restore. Dave McCracken Oracle Corp. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Mar-03 01:15 UTC
Re: [Xen-devel] Design question for PV superpage support
Dave McCracken wrote:> The solution I am working on for how to support Linux hugepages (Xen > superpages) involves creating domains made up entirely of superpages. I can > create a working domain with superpages and am in the process of supporting > it in save/restore. > > For this to work properly this should be an attribute of a domain, specified > somewhere in domain configuration and attached to that domain for its > lifetime. This way it could be checked at memory populate time, save/restore > time, and by the various balloon drivers. > > My question is for those of you who best know the overall Xen design > principles. Where should the flag be specified by the user? Where should it > best be set in the running domain? I''ve seen some examples of flags being > passed around, but would like some guidance on the best place to put it to > fit into the Xen design.One thing I''m not quite sure about: when you support 2M pages for a domain, is it fully-supported, to the extent you can safely set PSE in cpuid, and allow the guest kernel to use 2M mappings as it usually would? Or are there further restrictions? You should support a feature flag in the guest kernel''s ELF notes to say that it support large PV pages. If the kernel asks for it, then you can enable PSE in cpuid, or have some other mechanism for the kernel to query that the feature is available. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Mar-03 01:32 UTC
Re: [Xen-devel] Design question for PV superpage support
Dave McCracken wrote:> No, actually, it doesn''t do that. The hypervisor allocates 2M pages, then > expands them into 4K pages for the mfn<->pfn maps, etc. >What happens if you start using MMU_MACHPHYS_UPDATE on pages which are part of a 2M mapping? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Mar-03 01:37 UTC
Re: [Xen-devel] Design question for PV superpage support
Dave McCracken wrote:> My understanding of save/restore is that it will save your carefully selected > 2M pages, cheerfully restore them onto a random set of mfns, then expect your > guest to continue running. I haven''t studied it enough to know whether your > guest at least gets a chance to intervene and fix things after the restore.Your guest would need to be in the position to allocate an extra 512 L1 pte pages to replace each shattered 2M page, which could be awkward - and wouldn''t have any realistic way to continue if it fails to do so. Perhaps some kind of special pool of pages could be provided to the domain to help it satisfy its memory needs in recovering from a restore-shattered large page. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mick Jordan
2009-Mar-03 03:59 UTC
Re: [Xen-devel] Design question for PV superpage support
On 03/02/09 17:37, Jeremy Fitzhardinge wrote:> Dave McCracken wrote: >> My understanding of save/restore is that it will save your carefully >> selected 2M pages, cheerfully restore them onto a random set of mfns, >> then expect your guest to continue running. I haven''t studied it >> enough to know whether your guest at least gets a chance to intervene >> and fix things after the restore. >Since restore already requires quite a lot of reset, e.g., grant table mappings, on the part of the guest, it seems that checking the validity of any large page mappings should be possible at the same time. Obviously you could get in a big mess if you mapped the code that is going to do the fixup on a large page, but that is unlikely and easily avoidable.> Your guest would need to be in the position to allocate an extra 512 > L1 pte pages to replace each shattered 2M page, which could be awkward > - and wouldn''t have any realistic way to continue if it fails to do > so. Perhaps some kind of special pool of pages could be provided to > the domain to help it satisfy its memory needs in recovering from a > restore-shattered large page.In general, I think the guest should assume that large page mappings are merely an optimization that (a) might not be possible on domain start due to machine memory fragmentation and (b) that this condition might also occur on restore. Given these, it must always be prepared to function with 4K pages, which implies that it would need to preserve enough page table frame memory to be able revert from large to small pages. Mick _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Mar-03 14:33 UTC
RE: [Xen-devel] Design question for PV superpage support
> In general, I think the guest should assume that large page > mappings are > merely an optimization that (a) might not be possible on domain start > due to machine memory fragmentation and (b) that this condition might > also occur on restore. Given these, it must always be prepared to > function with 4K pages, which implies that it would need to preserve > enough page table frame memory to be able revert from large > to small pages. > > MickDo you disagree with my assertion that use of 2MB pages is almost always an attempt to eke out a performance improvement, that emulating 2MB pages with fragmented 4KB pages is likely slower than just using 4KB pages to start with, and thus that "must always be prepared to function with 4KB pages" should NOT occur silently (if at all)? BTW, thinking ahead to ballooning with 2MB pages, are we prepared to assume that a relinquished 2MB page can''t be fragmented? While this may be appealing for systems where nearly all guests are using 2MB pages, systems where the 2MB guest is an odd duck might suffer substantially by making that assumption. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mick Jordan
2009-Mar-03 17:06 UTC
Re: [Xen-devel] Design question for PV superpage support
On 03/03/09 06:33, Dan Magenheimer wrote:>> In general, I think the guest should assume that large page >> mappings are >> merely an optimization that (a) might not be possible on domain start >> due to machine memory fragmentation and (b) that this condition might >> also occur on restore. Given these, it must always be prepared to >> function with 4K pages, which implies that it would need to preserve >> enough page table frame memory to be able revert from large >> to small pages. >> >> Mick >> > > Do you disagree with my assertion that use of 2MB pages is > almost always an attempt to eke out a performance improvement, > that emulating 2MB pages with fragmented 4KB pages is likely > slower than just using 4KB pages to start with, and thus > that "must always be prepared to function with 4KB pages" > should NOT occur silently (if at all)? >I agree with the first statement. I''m not sure what you mean by "emulate 2MB pages with fragmented 4K pages" unless you assume nested page table support or you just mean falling back to 4K pages. As for whether a change should be silent, I''m less clear on that. I certainly wouldn''t consider it a fatal condition requiring domain termination, That position is consistent with the "optimization not correctness" view of using large tables. However, a guest might want to indicate in some way that it has downgraded> BTW, thinking ahead to ballooning with 2MB pages, are we prepared > to assume that a relinquished 2MB page can''t be fragmented? > While this may be appealing for systems where nearly all > guests are using 2MB pages, systems where the 2MB guest is > an odd duck might suffer substantially by making that > assumption. >Agreed. All of this really only becomes an issue when memory is overcommitted. Unfortunately, that is precisely when 2MB machine contiguous pages are likely to be difficult to find. Mick _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Mar-03 17:23 UTC
Re: [Xen-devel] Design question for PV superpage support
Mick Jordan wrote:> On 03/03/09 06:33, Dan Magenheimer wrote: >>> In general, I think the guest should assume that large page >>> mappings are >>> merely an optimization that (a) might not be possible on domain start >>> due to machine memory fragmentation and (b) that this condition might >>> also occur on restore. Given these, it must always be prepared to >>> function with 4K pages, which implies that it would need to preserve >>> enough page table frame memory to be able revert from large >>> to small pages. >>> >>> Mick >>> >> >> Do you disagree with my assertion that use of 2MB pages is >> almost always an attempt to eke out a performance improvement, >> that emulating 2MB pages with fragmented 4KB pages is likely >> slower than just using 4KB pages to start with, and thus >> that "must always be prepared to function with 4KB pages" >> should NOT occur silently (if at all)? >> > I agree with the first statement. I''m not sure what you mean by > "emulate 2MB pages with fragmented 4K pages" unless you assume nested > page table support or you just mean falling back to 4K pages. As for > whether a change should be silent, I''m less clear on that. I certainly > wouldn''t consider it a fatal condition requiring domain termination, > That position is consistent with the "optimization not correctness" > view of using large tables. However, a guest might want to indicate in > some way that it has downgradedThe tradeoff is between the performance gain one might get from using large pages vs the intrusiveness of changes to a PV kernel. Given that when paravirtualizing this we''re going to be making small changes to the kernel''s existing large page support, rather than adding it new or a separate large-page mechanism, we need to make sure that as many of the guest''s existing assumptions can be satisfied. The requirement that a guest be able to come up with enough L1 pagetable pages to be able to map all the shattered 2M mappings at any time definitely doesn''t fall into that category. You''d need to: 1. Have an interface for Xen to tell the guest which pages need to be remapped. Presumably this would be in terms of once contiguous pfn ranges which are now backed with discontinuous mfns. 2. Get the guest to remap those pfns to the new mfns, which will require walking every pagetable of every process searching for those pfns, allocating memory for the new pagetable level. However the main use of 2M mappings in Linux is to map the kernel text and data. That''s clearly not going to be possible if we need to run kernel code to put things together after a restore. Hm, given that, I guess we could just kludge it into hugetlbfs, but it really does make it a very narrow set of users.>> BTW, thinking ahead to ballooning with 2MB pages, are we prepared >> to assume that a relinquished 2MB page can''t be fragmented? >> While this may be appealing for systems where nearly all >> guests are using 2MB pages, systems where the 2MB guest is >> an odd duck might suffer substantially by making that >> assumption. >> > Agreed. All of this really only becomes an issue when memory is > overcommitted. Unfortunately, that is precisely when 2MB machine > contiguous pages are likely to be difficult to find.If 2M pages are becoming more important, then we should change Xen to do all domain allocations in 2M units, while reserving separate superpages specifically for fragmenting into 4k allocations. Its certainly sensible to always round a domain''s initial size up to 2M (most will already be a 2M multiple, I suspect). Balloon is the obvious exception, but I would argue that ballooning in less than 2M units is a lot of fiddly makework. The difference between a giving a domain 128MB vs 126MB is already pretty trivial; dealing with 4k changes in domain size is laughably small. (Now Keir brings up all difficulties...) J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Mar-03 17:26 UTC
Re: [Xen-devel] Design question for PV superpage support
Mick Jordan wrote:> Since restore already requires quite a lot of reset, e.g., grant table > mappings, on the part of the guest, it seems that checking the > validity of any large page mappings should be possible at the same > time. Obviously you could get in a big mess if you mapped the code > that is going to do the fixup on a large page, but that is unlikely > and easily avoidable.That''s actually the most likely case in Linux. Not being able to use 2M mappings for kernel code+data removes about 95% of the utility.> In general, I think the guest should assume that large page mappings > are merely an optimization that (a) might not be possible on domain > start due to machine memory fragmentation and (b) that this condition > might also occur on restore. Given these, it must always be prepared > to function with 4K pages, which implies that it would need to > preserve enough page table frame memory to be able revert from large > to small pages.I think that too intrusive. I''d want to see some very convincing measurements to justify doing these kinds of changes to pvops Linux, for example. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Mar-03 17:28 UTC
Re: [Xen-devel] Design question for PV superpage support
Dan Magenheimer wrote:> BTW, thinking ahead to ballooning with 2MB pages, are we prepared > to assume that a relinquished 2MB page can''t be fragmented? > While this may be appealing for systems where nearly all > guests are using 2MB pages, systems where the 2MB guest is > an odd duck might suffer substantially by making that > assumption. >Well, I still think that 4k pages are a ludicrously tiny unit of memory for Xen to be dealing with, and it shouldn''t bother to get out of bed for less than 2M. If we treat 4k pages as the special case then keeping the Xen heap unfragmented at the 2M level should be fairly easy, no? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Mar-03 18:09 UTC
RE: [Xen-devel] Design question for PV superpage support
> Dan Magenheimer wrote: > > BTW, thinking ahead to ballooning with 2MB pages, are we prepared > > to assume that a relinquished 2MB page can''t be fragmented? > > While this may be appealing for systems where nearly all > > guests are using 2MB pages, systems where the 2MB guest is > > an odd duck might suffer substantially by making that > > assumption. > > Well, I still think that 4k pages are a ludicrously tiny unit > of memory > for Xen to be dealing with, and it shouldn''t bother to get out of bed > for less than 2M. If we treat 4k pages as the special case > then keeping > the Xen heap unfragmented at the 2M level should be fairly easy, no?Probably true, though I suspect this is harder than it sounds. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Mar-03 18:10 UTC
Re: [Xen-devel] Design question for PV superpage support
On 03/03/2009 17:06, "Mick Jordan" <Mick.Jordan@sun.com> wrote:> I agree with the first statement. I''m not sure what you mean by "emulate 2MB > pages with fragmented 4K pages" unless you assume nested page table support or > you just mean falling back to 4K pages. As for whether a change should be > silent, I''m less clear on that. I certainly wouldn''t consider it a fatal > condition requiring domain termination, That position is consistent with the > "optimization not correctness" view of using large tables. However, a guest > might want to indicate in some way that it has downgradedYeah, I somehow forgot about this actually. Of course it is hard to downgrade to non-2MB pages across save/restore, because the guest-owned pagetables have the superpage mappings baked into them. Oh well, that makes such graceful downgrade much less attractive to implement, so I withdraw the suggestion! -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel