I was inspired by the talk from Ben Serebrin at this weeks'' summit to investigate using 2MB pages in my Java VM on Xen for x86-64. I think I have done everything correctly, but Xen (3.1.4) rejects my attempt to set the PSE bit in the L2 frame for the 2MB page. Looking at the Xen code the L2_DISALLOW_MASK (0xFF800180U) simply rejects the update if the PSE bit is set. I found some posts from quite a while ago on xen-devel discussing patches to allow large pages, so my question is for clarification on the status of this feature. I.e., is it in any stable release and if so what version? Thanks Mick _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2009-Feb-27 23:28 UTC
Re: [Xen-devel] 2MB page PV guest support clarification
Mick Jordan wrote:> I was inspired by the talk from Ben Serebrin at this weeks'' summit to > investigate using 2MB pages in my Java VM on Xen for x86-64. > > I think I have done everything correctly, but Xen (3.1.4) rejects my > attempt to set the PSE bit in the L2 frame for the 2MB page. Looking > at the Xen code the L2_DISALLOW_MASK (0xFF800180U) simply rejects the > update if the PSE bit is set.Yes. Xen doesn''t support large mappings for PV guests. However, there''s a lot less to worry about for PV guests compared to hvm guests. A PV guest directly uses the CPU''s pagetable+tlb hardware, and so a tlb miss results in a single simple walk of the pagetable, and the overall tlb pressure is a lot less. The desire to use large pages for hvm guests is driven by the cost of a tlb miss when you have 4k guest pages layered on 4k host pages, resulting in 24 memory accesses in the worst case; a PV tlb miss is no more expensive than a native tlb miss by comparison. Large pages could potentially reduce the cost of a PV tlb miss as well, but also pose quite a few tradeoffs. You can''t generally use large mappings for the kernel, as you can native, because of all the pages which need RO mappings (pagetables, gdt, etc). Also, IO and the balloon driver operate at 4k page resolution, so breaking a contiguous 2M page would require the mapping to be shattered.> I found some posts from quite a while ago on xen-devel discussing > patches to allow large pages, so my question is for clarification on > the status of this feature. I.e., is it in any stable release and if > so what version?Its a work in progress, but there''s nothing usable yet, as far as I know. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 02/27/09 15:28, Jeremy Fitzhardinge wrote:> Mick Jordan wrote: >> I was inspired by the talk from Ben Serebrin at this weeks'' summit to >> investigate using 2MB pages in my Java VM on Xen for x86-64. >> >> I think I have done everything correctly, but Xen (3.1.4) rejects my >> attempt to set the PSE bit in the L2 frame for the 2MB page. Looking >> at the Xen code the L2_DISALLOW_MASK (0xFF800180U) simply rejects the >> update if the PSE bit is set. > > Yes. Xen doesn''t support large mappings for PV guests. However, > there''s a lot less to worry about for PV guests compared to hvm > guests. A PV guest directly uses the CPU''s pagetable+tlb hardware, and > so a tlb miss results in a single simple walk of the pagetable, and > the overall tlb pressure is a lot less. The desire to use large pages > for hvm guests is driven by the cost of a tlb miss when you have 4k > guest pages layered on 4k host pages, resulting in 24 memory accesses > in the worst case; a PV tlb miss is no more expensive than a native > tlb miss by comparison. > > Large pages could potentially reduce the cost of a PV tlb miss as > well, but also pose quite a few tradeoffs. You can''t generally use > large mappings for the kernel, as you can native, because of all the > pages which need RO mappings (pagetables, gdt, etc). Also, IO and the > balloon driver operate at 4k page resolution, so breaking a contiguous > 2M page would require the mapping to be shattered.Well that''s disappointing! The Java heap is a perfect candidate for large pages and, since the heap tends to be large, would result in a TLB size reduction of a factor of 512, thereby reducing the misses. I have the luxury of a lot more semantics on memory usage than a typical OS, so would only use large pages where it makes sense (heap and runtime compiled code). I have my own equivalent of the balloon driver. Mick _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> > I found some posts from quite a while ago on xen-devel discussing > > patches to allow large pages, so my question is for clarification on > > the status of this feature. I.e., is it in any stable release and if > > so what version? > > Its a work in progress, but there''s nothing usable yet, as far as I > know.Oracle have been working on PV 2MB page support, and I expect they''ll pitch in with an update. Over the last 18 months or so there have been a number of changes to xen''s PV PT handling that make support of 2MB pages significantly easier than it was previously. However, the guest has to be careful how it uses them as it can''t alias any memory that may be used for storing pagetables pages (that must be RO). Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 02/27/09 16:03, Ian Pratt wrote:>>> I found some posts from quite a while ago on xen-devel discussing >>> patches to allow large pages, so my question is for clarification on >>> the status of this feature. I.e., is it in any stable release and if >>> so what version? >>> >> Its a work in progress, but there''s nothing usable yet, as far as I >> know. >> > > Oracle have been working on PV 2MB page support, and I expect they''ll pitch in with an update. > > Over the last 18 months or so there have been a number of changes to xen''s PV PT handling that make support of 2MB pages significantly easier than it was previously. However, the guest has to be careful how it uses them as it can''t alias any memory that may be used for storing pagetables pages (that must be RO). > >Thanks for the update. I''ll wait to hear from the Oracle guys. You remark about aliasing prompts me to ask a general question about that. I am currently mapping physical to virtual 1-1 (because that is what minis-os has always done) as well as mapping parts of that to other areas in virtual memory. Both of these are RW mappings. Is that ok? It perfectly possible for me to unmap the 1-1 mappings or make them RO if I have to. Mick _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> You remark about aliasing prompts me to ask a general question about > that. I am currently mapping physical to virtual 1-1 (because that is > what minis-os has always done) as well as mapping parts of that to > other areas in virtual memory. Both of these are RW mappings. Is that > ok? It perfectly possible for me to unmap the 1-1 mappings or make them > RO if I have to.Any page that is part of a pagetable must be mapped RO in every mapping to it. Attempting to add a page that has RW mappings to a pagetable will fail (either when you make the hypercall to add the PTE, or when you pin a constructed pagetable or try switching to it). Thus, you need to be careful with 1:1 maps to remove pages that may become PT pages. It''s best to have a PT page allocator that tries to allocate PT''s from contiguous regions and then recycles them. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 02/27/09 17:28, Ian Pratt wrote:>> You remark about aliasing prompts me to ask a general question about >> that. I am currently mapping physical to virtual 1-1 (because that is >> what minis-os has always done) as well as mapping parts of that to >> other areas in virtual memory. Both of these are RW mappings. Is that >> ok? It perfectly possible for me to unmap the 1-1 mappings or make them >> RO if I have to. >> > > Any page that is part of a pagetable must be mapped RO in every mapping to it. Attempting to add a page that has RW mappings to a pagetable will fail (either when you make the hypercall to add the PTE, or when you pin a constructed pagetable or try switching to it). > > > Thus, you need to be careful with 1:1 maps to remove pages that may become PT pages. It''s best to have a PT page allocator that tries to allocate PT''s from contiguous regions and then recycles them. > >Ok. I need to check this. Certainly I am at some point taking already mapped pages and using them as pagetables. However, I am not getting any errors when adding the PTE. So perhaps the code does the mapping change already. Mick _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 28/02/2009 00:03, "Ian Pratt" <Ian.Pratt@eu.citrix.com> wrote:>> Its a work in progress, but there''s nothing usable yet, as far as I >> know. > > Oracle have been working on PV 2MB page support, and I expect they''ll pitch in > with an update. > > Over the last 18 months or so there have been a number of changes to xen''s PV > PT handling that make support of 2MB pages significantly easier than it was > previously. However, the guest has to be careful how it uses them as it can''t > alias any memory that may be used for storing pagetables pages (that must be > RO).Oracle already got their code checked in. You have to specify ''allowhugepage'' on Xen''s command line to enable it. It has limitations, such as save/restore doesn''t work. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Rolf Neugebauer
2009-Mar-02 10:44 UTC
Re: [Xen-devel] 2MB page PV guest support clarification
Mick Jordan wrote:> On 02/27/09 17:28, Ian Pratt wrote: >>> You remark about aliasing prompts me to ask a general question about >>> that. I am currently mapping physical to virtual 1-1 (because that is >>> what minis-os has always done) as well as mapping parts of that to >>> other areas in virtual memory. Both of these are RW mappings. Is that >>> ok? It perfectly possible for me to unmap the 1-1 mappings or make them >>> RO if I have to. >>> >> >> Any page that is part of a pagetable must be mapped RO in every mapping to it. Attempting to add a page that has RW mappings to a pagetable will fail (either when you make the hypercall to add the PTE, or when you pin a constructed pagetable or try switching to it). >> >> >> Thus, you need to be careful with 1:1 maps to remove pages that may become PT pages. It''s best to have a PT page allocator that tries to allocate PT''s from contiguous regions and then recycles them. >> >> > Ok. I need to check this. Certainly I am at some point taking already > mapped pages and using them as pagetables. However, I am not getting any > errors when adding the PTE. So perhaps the code does the mapping change > already.In mini-os, new_pt_frame() will update the 1:1 mapping to mark a PT page RO before hooking it into the page table. rolf> > Mick > > > ------------------------------------------------------------------------ > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dave McCracken
2009-Mar-02 13:45 UTC
Re: [Xen-devel] 2MB page PV guest support clarification
On Saturday 28 February 2009, Keir Fraser wrote:> On 28/02/2009 00:03, "Ian Pratt" <Ian.Pratt@eu.citrix.com> wrote: > >> Its a work in progress, but there''s nothing usable yet, as far as I > >> know. > > > > Oracle have been working on PV 2MB page support, and I expect they''ll > > pitch in with an update. > > > > Over the last 18 months or so there have been a number of changes to > > xen''s PV PT handling that make support of 2MB pages significantly easier > > than it was previously. However, the guest has to be careful how it uses > > them as it can''t alias any memory that may be used for storing pagetables > > pages (that must be RO). > > Oracle already got their code checked in. You have to specify > ''allowhugepage'' on Xen''s command line to enable it. It has limitations, > such as save/restore doesn''t work.I am the person at Oracle working on PV guest support for 2MB pages. I did get an initial patch accepted into the Xen hypervisor that enables basic 2MB page support. As Keir said, it requires ''allowhugepage'' on the Xen hypervisor command line. It supports the basic ability to specify PSE in the page table, and takes care of the associated type and reference tracking for the mapped page(s). What this patch does not do is make any guarantee about the alignment of the mapped page, which is a hardware requirement. The solution I am working on for this is to create domains with 2MB pages. The hypervisor already supports populating a domain with larger pages. I am working on supporting 2MB page domains at creation time and restore time. This approach will also require that balloon drivers understand and work with 2MB pages. Dave McCracken Oracle Corp. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 02/28/09 03:12, Keir Fraser wrote:> Oracle already got their code checked in. You have to specify > ''allowhugepage'' on Xen''s command line to enable it. It has limitations, such > as save/restore doesn''t work. > > -- Keir > >Checked into xen-unstable I presume and not, say 3.3.x? So what stable release will this make it into? Mick _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 02/03/2009 16:23, "Mick Jordan" <Mick.Jordan@sun.com> wrote:> On 02/28/09 03:12, Keir Fraser wrote: >> Oracle already got their code checked in. You have to specify >> ''allowhugepage'' on Xen''s command line to enable it. It has limitations, such >> as save/restore doesn''t work. >> >> -- Keir >> >> > Checked into xen-unstable I presume and not, say 3.3.x? So what stable > release will this make it into?3.4.0. It''s not generally useful enough to get backported to 3.3 branch. And if the extra support to make it useful does get checked into xen-unstable, it''s almost certainly then going to be too invasive for 3.3 branch. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 03/02/09 05:45, Dave McCracken wrote:> > I am the person at Oracle working on PV guest support for 2MB pages. I did > get an initial patch accepted into the Xen hypervisor that enables basic 2MB > page support. As Keir said, it requires ''allowhugepage'' on the Xen > hypervisor command line. It supports the basic ability to specify PSE in the > page table, and takes care of the associated type and reference tracking for > the mapped page(s). > > What this patch does not do is make any guarantee about the alignment of the > mapped page, which is a hardware requirement. The solution I am working on > for this is to create domains with 2MB pages. The hypervisor already > supports populating a domain with larger pages. I am working on supporting > 2MB page domains at creation time and restore time. This approach will also > require that balloon drivers understand and work with 2MB pages. >In my world, I make sure I allocate aligned contiguous machine 2MB pages. Of course that may not always be possible, depending on what you get from Xen. And I''ve seem some wild outer cases, such as swiss cheese memory with every other page missing and no physical run longer than two 4K pages! Mick _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel