thr3ads.net - Xen devel - [Xen-devel] [RFC][PATCH]Large Page Support for HAP [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Huang2, Wei

2007-Nov-15 16:26 UTC

[Xen-devel] [RFC][PATCH]Large Page Support for HAP

I implemented a preliminary version of HAP large page support. My
testings showed that 32bit PAE and 64bit worked well. Also I saw decent
performance improvement for certain benchmarks.
 
So before I go too far, I send this patch to community for
reviews/comments. This patch goes with xen-unstable changeset 16281. I
will redo it after collecting all ideas.
 
Thanks,
 
-Wei
 
===========DESIGN IDEAS:
1. Large page requests
- xc_hvm_build.c requests large page (2MB for now) while starting guests
- memory.c handles large page requests. If it can not handle it, falls
back to 4KB pages.
 
2. P2M table
- P2M table takes page size order as a parameter; It builds P2M table
(setting PSE bit, etc.) according to page size.
- Other related functions (such as p2m_audit()) handles the table based
on page size too.
- Page split/merge
** Large page will be split into 4KB page in P2M table if needed. For
instance, if set_p2m_entry() handles 4KB page but finds PSE/PRESENT bits
are on, it will further split large page to 4KB pages.
** There is NO merge from 4KB pages to large page. Since large page is
only used at the very beginning, guest_physmap_add(), this is OK for
now.
 
3. HAP
- To access the PSE bit, L2 pages of P2M table is installed in linear
mapping on SH_LINEAR_PT_VIRT_START. We borrow this address space since
it was not used.
 
4. gfn_to_mfn translation (P2M)
- gfn_to_mfn_foreign() traverses P2M table and handles address
translation correctly based on PSE bit.
- gfn_to_mfn_current() accesses SH_LINEAR_PT_VIRT_START to check PSE
bit. If is on, we handle translation using large page. Otherwise, it
falls back to normal RO_MPT_VIRT_START address space to access P2M L1
pages.
 
5. M2P translation
- Same as before, M2P translation still happens on 4KB level.
 
AREAS NEEDS COMMENTS:
1. Large page for 32bit mode
- 32bit use 4MB for large page. This is very annoying for
xc_hvm_build.c. I don''t want to create another 4MB page_array for it.
- Because of this, this area has not been tested very well. I expect
changes soon.
 
2. Shadow paging
- This implementation will affect shadow mode, especially at
xc_hvm_build.c and memory.c.
- Where and how to avoid affecting shadow?
 
3. Turn it on/off
- Do we want to turn this feature on/off through option (kernel option
or anything else)?
 
4. Other missing areas?
==========


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Nov-15 16:36 UTC

head link

[Xen-devel] Re: [RFC][PATCH]Large Page Support for HAP

On 15/11/07 16:26, "Huang2, Wei" <Wei.Huang2@amd.com> wrote:
> I implemented a preliminary version of HAP large page support. My testings
> showed that 32bit PAE and 64bit worked well. Also I saw decent performance
> improvement for certain benchmarks.
>  
> So before I go too far, I send this patch to community for
reviews/comments.
> This patch goes with xen-unstable changeset 16281. I will redo it after
> collecting all ideas.
Looks pretty good to me.

To get round the 2M/4M distinction I¹d write code in terms of normal page¹
and super page¹, where the former is order 0 and the latter is order
L2_SUPERPAGE_ORDER (or somesuch name). I would try to avoid referencing 2M
or 4M explicitly as much as possible.

Having to shatter the 0-2MB region for the VGA RAM hole is a shame, but I
suppose there¹s no way round that really.

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Huang2, Wei

2007-Nov-15 17:36 UTC

head link

RE: [Xen-devel] Re: [RFC][PATCH]Large Page Support for HAP

To get round the 2M/4M distinction I''d write code in terms of
''normal
page'' and ''super page'', where the former is order 0
and the latter is
order L2_SUPERPAGE_ORDER (or somesuch name). I would try to avoid
referencing 2M or 4M explicitly as much as possible.



 

Will do it.

 


Having to shatter the 0-2MB region for the VGA RAM hole is a shame, but
I suppose there''s no way round that really.

 

Since tools control the page_array (size, order, etc.), this is the only
way to do it. My major concern is about handing 4MB page for 32bit mode
and impacts to shadow paging. Any thought?

 

Thanks,

 

-Wei

 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Nov-15 17:42 UTC

head link

Re: [Xen-devel] Re: [RFC][PATCH]Large Page Support for HAP

As I said, you should avoid explicitly referencing the actual superpage size
to make supporting 4MB superpage easier. For the impacts to shadow paging,
that¹s Tim¹s area. :-)

 -- Keir

On 15/11/07 17:36, "Huang2, Wei" <Wei.Huang2@amd.com> wrote:
> Having to shatter the 0-2MB region for the VGA RAM hole is a shame, but I
> suppose there¹s no way round that really.
>  
> Since tools control the page_array (size, order, etc.), this is the only
way
> to do it. My major concern is about handing 4MB page for 32bit mode and
> impacts to shadow paging. Any thought?



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Huang2, Wei

2007-Nov-15 19:33 UTC

head link

[Xen-devel] RE: [RFC][PATCH]Large Page Support for HAP

Tim Deegan wrote:> Hi,
> 
> At 10:26 -0600 on 15 Nov (1195122365), Huang2, Wei wrote:
>> 2. Shadow paging
>> - This implementation will affect shadow mode, especially at
>> xc_hvm_build.c and memory.c. 
>> - Where and how to avoid affecting shadow?
> 
> Shadow already uses SH_LINEAR_PT_VIRT_START, so we can''t put a
> mapping there.  
Given that we don''t use SH_LINEAR_PT_VIRT_START in current HAP mode, I
think it is OK to borrow this address space for HAP. You are right that
Shadow is using it; so it is a bit dangerous. If we can prevent large
page support in shadow paging, is using SH_LINEAR_PT still acceptable
for you?
> Can you just use the normal linear mapping plus the
> RO_MPT mapping of the p2m instead?  
> 
> Otherwise, the only thing I can see that shadow will need is for the
> callback from the p2m code that writes the entries to be made aware
> of the superpage level-2 entries.  It''ll need to treat a superpage
> entry the same way as 512/1024 level-1 entries.   

Could you elaborate on this idea? RO_MPT is currently being used. I did
not see any spare linear space I can borrow except SH_LINEAR_PT. Do you
mean I can still borrow it, but have to handle it correctly in shadow
code if it is a super page?


Thanks,

-Wei
> 
> Cheers,
> 
> Tim.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Byrne, John (HP Labs)

2007-Nov-16 17:40 UTC

head link

RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Wei,
 
I have been hacking at this, too,  since I am interested in trying 1GB
pages to see what they can do. After I dug myself into a hole, I
restarted from the beginning and am trying a different approach than
modifying xc_hvm_build.c: modify populate_physmap() to opportunistically
allocate large pages, if possible. I just thought I''d mention it.
 
John Byrne
 

________________________________

From: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Huang2, Wei
Sent: Thursday, November 15, 2007 8:26 AM
To: xen-devel@lists.xensource.com
Cc: Tim Deegan
Subject: [Xen-devel] [RFC][PATCH]Large Page Support for HAP


I implemented a preliminary version of HAP large page support. My
testings showed that 32bit PAE and 64bit worked well. Also I saw decent
performance improvement for certain benchmarks.
 
So before I go too far, I send this patch to community for
reviews/comments. This patch goes with xen-unstable changeset 16281. I
will redo it after collecting all ideas.
 
Thanks,
 
-Wei
 
===========DESIGN IDEAS:
1. Large page requests
- xc_hvm_build.c requests large page (2MB for now) while starting guests
- memory.c handles large page requests. If it can not handle it, falls
back to 4KB pages.
 
2. P2M table
- P2M table takes page size order as a parameter; It builds P2M table
(setting PSE bit, etc.) according to page size.
- Other related functions (such as p2m_audit()) handles the table based
on page size too.
- Page split/merge
** Large page will be split into 4KB page in P2M table if needed. For
instance, if set_p2m_entry() handles 4KB page but finds PSE/PRESENT bits
are on, it will further split large page to 4KB pages.
** There is NO merge from 4KB pages to large page. Since large page is
only used at the very beginning, guest_physmap_add(), this is OK for
now.
 
3. HAP
- To access the PSE bit, L2 pages of P2M table is installed in linear
mapping on SH_LINEAR_PT_VIRT_START. We borrow this address space since
it was not used.
 
4. gfn_to_mfn translation (P2M)
- gfn_to_mfn_foreign() traverses P2M table and handles address
translation correctly based on PSE bit.
- gfn_to_mfn_current() accesses SH_LINEAR_PT_VIRT_START to check PSE
bit. If is on, we handle translation using large page. Otherwise, it
falls back to normal RO_MPT_VIRT_START address space to access P2M L1
pages.
 
5. M2P translation
- Same as before, M2P translation still happens on 4KB level.
 
AREAS NEEDS COMMENTS:
1. Large page for 32bit mode
- 32bit use 4MB for large page. This is very annoying for
xc_hvm_build.c. I don''t want to create another 4MB page_array for it.
- Because of this, this area has not been tested very well. I expect
changes soon.
 
2. Shadow paging
- This implementation will affect shadow mode, especially at
xc_hvm_build.c and memory.c.
- Where and how to avoid affecting shadow?
 
3. Turn it on/off
- Do we want to turn this feature on/off through option (kernel option
or anything else)?
 
4. Other missing areas?
==========

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Huang2, Wei

2007-Nov-16 17:53 UTC

head link

RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

John,
 
If you have a better design, share with us and I will be happy to work
with you. :-) I agree that xc_hvm_build.c does not have to be modified,
if memory.c is smart enough to scan all page_array information. But one
concern is that sometimes Xen tools really want to create mapping at 4KB
boundary instead of using large page. That requires extra information
passed from tools (e.g., xc_hvm_build.c) to memory.c
 
-Wei

________________________________

From: Byrne, John (HP Labs) [mailto:john.l.byrne@hp.com] 
Sent: Friday, November 16, 2007 11:41 AM
To: Huang2, Wei; xen-devel@lists.xensource.com
Cc: Tim Deegan
Subject: RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP


Wei,
 
I have been hacking at this, too,  since I am interested in trying 1GB
pages to see what they can do. After I dug myself into a hole, I
restarted from the beginning and am trying a different approach than
modifying xc_hvm_build.c: modify populate_physmap() to opportunistically
allocate large pages, if possible. I just thought I''d mention it.
 
John Byrne
 


________________________________

From: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Huang2, Wei
Sent: Thursday, November 15, 2007 8:26 AM
To: xen-devel@lists.xensource.com
Cc: Tim Deegan
Subject: [Xen-devel] [RFC][PATCH]Large Page Support for HAP


I implemented a preliminary version of HAP large page support. My
testings showed that 32bit PAE and 64bit worked well. Also I saw decent
performance improvement for certain benchmarks.
 
So before I go too far, I send this patch to community for
reviews/comments. This patch goes with xen-unstable changeset 16281. I
will redo it after collecting all ideas.
 
Thanks,
 
-Wei
 
===========DESIGN IDEAS:
1. Large page requests
- xc_hvm_build.c requests large page (2MB for now) while starting guests
- memory.c handles large page requests. If it can not handle it, falls
back to 4KB pages.
 
2. P2M table
- P2M table takes page size order as a parameter; It builds P2M table
(setting PSE bit, etc.) according to page size.
- Other related functions (such as p2m_audit()) handles the table based
on page size too.
- Page split/merge
** Large page will be split into 4KB page in P2M table if needed. For
instance, if set_p2m_entry() handles 4KB page but finds PSE/PRESENT bits
are on, it will further split large page to 4KB pages.
** There is NO merge from 4KB pages to large page. Since large page is
only used at the very beginning, guest_physmap_add(), this is OK for
now.
 
3. HAP
- To access the PSE bit, L2 pages of P2M table is installed in linear
mapping on SH_LINEAR_PT_VIRT_START. We borrow this address space since
it was not used.
 
4. gfn_to_mfn translation (P2M)
- gfn_to_mfn_foreign() traverses P2M table and handles address
translation correctly based on PSE bit.
- gfn_to_mfn_current() accesses SH_LINEAR_PT_VIRT_START to check PSE
bit. If is on, we handle translation using large page. Otherwise, it
falls back to normal RO_MPT_VIRT_START address space to access P2M L1
pages.
 
5. M2P translation
- Same as before, M2P translation still happens on 4KB level.
 
AREAS NEEDS COMMENTS:
1. Large page for 32bit mode
- 32bit use 4MB for large page. This is very annoying for
xc_hvm_build.c. I don''t want to create another 4MB page_array for it.
- Because of this, this area has not been tested very well. I expect
changes soon.
 
2. Shadow paging
- This implementation will affect shadow mode, especially at
xc_hvm_build.c and memory.c.
- Where and how to avoid affecting shadow?
 
3. Turn it on/off
- Do we want to turn this feature on/off through option (kernel option
or anything else)?
 
4. Other missing areas?
==========

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Nov-16 18:03 UTC

head link

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

To my mind populate_physmap() should do what it is told w.r.t. extent sizes.
I don¹t mind some modification of xc_hvm_build to support this feature.

 -- Keir

On 16/11/07 17:53, "Huang2, Wei" <Wei.Huang2@amd.com> wrote:
> John,
>  
> If you have a better design, share with us and I will be happy to work with
> you. :-) I agree that xc_hvm_build.c does not have to be modified, if
memory.c
> is smart enough to scan all page_array information. But one concern is that
> sometimes Xen tools really want to create mapping at 4KB boundary instead
of
> using large page. That requires extra information passed from tools (e.g.,
> xc_hvm_build.c) to memory.c
>  
> -Wei
> 
> 
> From: Byrne, John (HP Labs) [mailto:john.l.byrne@hp.com]
> Sent: Friday, November 16, 2007 11:41 AM
> To: Huang2, Wei; xen-devel@lists.xensource.com
> Cc: Tim Deegan
> Subject: RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP
> 
> Wei,
>  
> I have been hacking at this, too,  since I am interested in trying 1GB
pages
> to see what they can do. After I dug myself into a hole, I restarted from
the
> beginning and am trying a different approach than modifying xc_hvm_build.c:
> modify populate_physmap() to opportunistically allocate large pages, if
> possible. I just thought I''d mention it.
>  
> John Byrne
>  
> 
> 
> From: xen-devel-bounces@lists.xensource.com
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Huang2, Wei
> Sent: Thursday, November 15, 2007 8:26 AM
> To: xen-devel@lists.xensource.com
> Cc: Tim Deegan
> Subject: [Xen-devel] [RFC][PATCH]Large Page Support for HAP
> 
> I implemented a preliminary version of HAP large page support. My testings
> showed that 32bit PAE and 64bit worked well. Also I saw decent performance
> improvement for certain benchmarks.
>  
> So before I go too far, I send this patch to community for
reviews/comments.
> This patch goes with xen-unstable changeset 16281. I will redo it after
> collecting all ideas.
>  
> Thanks,
>  
> -Wei
>  
> ===========> DESIGN IDEAS:
> 1. Large page requests
> - xc_hvm_build.c requests large page (2MB for now) while starting guests
> - memory.c handles large page requests. If it can not handle it, falls back
to
> 4KB pages.
>  
> 2. P2M table
> - P2M table takes page size order as a parameter; It builds P2M table
(setting
> PSE bit, etc.) according to page size.
> - Other related functions (such as p2m_audit()) handles the table based on
> page size too.
> - Page split/merge
> ** Large page will be split into 4KB page in P2M table if needed. For
> instance, if set_p2m_entry() handles 4KB page but finds PSE/PRESENT bits
are
> on, it will further split large page to 4KB pages.
> ** There is NO merge from 4KB pages to large page. Since large page is only
> used at the very beginning, guest_physmap_add(), this is OK for now.
>  
> 3. HAP
> - To access the PSE bit, L2 pages of P2M table is installed in linear
mapping
> on SH_LINEAR_PT_VIRT_START. We borrow this address space since it was not
> used.
>  
> 4. gfn_to_mfn translation (P2M)
> - gfn_to_mfn_foreign() traverses P2M table and handles address translation
> correctly based on PSE bit.
> - gfn_to_mfn_current() accesses SH_LINEAR_PT_VIRT_START to check PSE bit.
If
> is on, we handle translation using large page. Otherwise, it falls back to
> normal RO_MPT_VIRT_START address space to access P2M L1 pages.
>  
> 5. M2P translation
> - Same as before, M2P translation still happens on 4KB level.
>  
> AREAS NEEDS COMMENTS:
> 1. Large page for 32bit mode
> - 32bit use 4MB for large page. This is very annoying for xc_hvm_build.c. I
> don''t want to create another 4MB page_array for it.
> - Because of this, this area has not been tested very well. I expect
changes
> soon.
>  
> 2. Shadow paging
> - This implementation will affect shadow mode, especially at xc_hvm_build.c
> and memory.c.
> - Where and how to avoid affecting shadow?
>  
> 3. Turn it on/off
> - Do we want to turn this feature on/off through option (kernel option or
> anything else)?
>  
> 4. Other missing areas?
> ==========> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephen C. Tweedie

2007-Nov-19 20:27 UTC

head link

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Hi,

On Thu, 2007-11-15 at 10:26 -0600, Huang2, Wei wrote:
> DESIGN IDEAS:
> 1. Large page requests
> - xc_hvm_build.c requests large page (2MB for now) while starting
> guests
> - memory.c handles large page requests. If it can not handle it, falls
> back to 4KB pages.
It makes me uncomfortable if the guest can''t be sure that a PSE request
isn''t actually being honoured by the hardware.

A guest OS has to go to a lot of trouble to use large pages.  Such pages
upset the normal page recycling of the guest, they are hard to
recycle... but the guest expects that the compromises are worth it
because large pages are more efficient at the hardware level.

So if the HV is only going to supply them on a best-effort basis --- if
a guest cannot actually rely on a large-page request being honoured ---
then it''s not clear whether this is a net benefit or a net cost to the
guest.

--Stephen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Nov-20 10:27 UTC

head link

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

On 19/11/07 20:27, "Stephen C. Tweedie" <sct@redhat.com> wrote:
> It makes me uncomfortable if the guest can''t be sure that a PSE
request
> isn''t actually being honoured by the hardware.
> 
> A guest OS has to go to a lot of trouble to use large pages.  Such pages
> upset the normal page recycling of the guest, they are hard to
> recycle... but the guest expects that the compromises are worth it
> because large pages are more efficient at the hardware level.
> 
> So if the HV is only going to supply them on a best-effort basis --- if
> a guest cannot actually rely on a large-page request being honoured ---
> then it''s not clear whether this is a net benefit or a net cost to
the
> guest.
An HVM guest always thinks it has big contiguous chunks of RAM. The
superpage mappings get shattered invisibly by the HV in the shadow page
tables only if 2M/4M allocations were not actually possible. This shattering
happens unconditionally right now, so what''s being proposed is a net
benefit
to HVM guests.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephen C. Tweedie

2007-Nov-20 11:56 UTC

head link

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Hi,

On Tue, 2007-11-20 at 10:27 +0000, Keir Fraser wrote:
> An HVM guest always thinks it has big contiguous chunks of RAM. The
> superpage mappings get shattered invisibly by the HV in the shadow page
> tables only if 2M/4M allocations were not actually possible. This
shattering
> happens unconditionally right now, so what''s being proposed is a
net benefit
> to HVM guests.
If an HVM guest asks for a bigpage allocation and silently fails to get
it, then that is a net lose for the guest --- the guest takes all of the
pain for none of the benefits of bigpage.

So, you may be better off not offering bigpages at all than offering
them on a best-effort basis; at least that way the guest knows for sure
what resources it has available.

I''m not against supporting bigpages.  But if there''s no way
for a guest
to know for sure if it has actually _got_ big pages, then I''m not sure
how much use it is.

Note that this probably works fine for controlled benchmark scenarios
where you''re running a guest on a single carefully-configured host with
matched bigpage reservations.  But in general, you need bigpages to
continue to work predictably over save/restore, migrate, balloon etc.
else they become a net cost, not a net gain, to the guest.

--Stephen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2007-Nov-20 12:31 UTC

head link

RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

> On Tue, 2007-11-20 at 10:27 +0000, Keir Fraser wrote:
> 
> > An HVM guest always thinks it has big contiguous chunks of RAM. The
> > superpage mappings get shattered invisibly by the HV in the shadow
> page
> > tables only if 2M/4M allocations were not actually possible. This
> shattering
> > happens unconditionally right now, so what''s being proposed
is a net
> benefit
> > to HVM guests.
> 
> If an HVM guest asks for a bigpage allocation and silently fails to
get> it, then that is a net lose for the guest --- the guest takes all of
the> pain for none of the benefits of bigpage.
> 
> So, you may be better off not offering bigpages at all than offering
> them on a best-effort basis; at least that way the guest knows for
sure> what resources it has available.
Unfortunately, a number of guests assume big pages without actually
checking for the feature bit explicitly. For example x86_64 Linux
running HVM will assume it has big pages. We''re able to hack this
assumption out of it in PV mode.  IIRC Windows makes the same big page
assumption.

Ian


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephen C. Tweedie

2007-Nov-20 17:19 UTC

head link

RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Hi,

On Tue, 2007-11-20 at 12:31 +0000, Ian Pratt wrote:
> Unfortunately, a number of guests assume big pages without actually
> checking for the feature bit explicitly. For example x86_64 Linux
> running HVM will assume it has big pages.
Yes, but there''s a _big_ difference between the opportunistic uses
Linux
makes of PSE (bigpage mappings for large static areas like the kernel
text), and places where it is an explicit part of the ABI made to
applications, as in the case of hugetlbfs.

It''s the latter case which concerns me, as hugetlbfs is basically an
explicit contract between the guest OS and an application running on it.
Providing faked PSE at that level is something that would be best
avoided.

I don''t have any objection to doing opportunistic PSE for the former
case.  But telling the two apart is rather hard.

--Stephen

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Nov-20 17:22 UTC

head link

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

On 20/11/07 17:19, "Stephen C. Tweedie" <sct@redhat.com> wrote:
> Yes, but there''s a _big_ difference between the opportunistic uses
Linux
> makes of PSE (bigpage mappings for large static areas like the kernel
> text), and places where it is an explicit part of the ABI made to
> applications, as in the case of hugetlbfs.
> 
> It''s the latter case which concerns me, as hugetlbfs is basically
an
> explicit contract between the guest OS and an application running on it.
> Providing faked PSE at that level is something that would be best
> avoided.
> 
> I don''t have any objection to doing opportunistic PSE for the
former
> case.  But telling the two apart is rather hard.
Support for PV guests would be explicit. Hugetlbfs would know whether it had
superpages or not, and can fail in the latter case.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephen C. Tweedie

2007-Nov-20 17:40 UTC

head link

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Hi,

On Tue, 2007-11-20 at 17:22 +0000, Keir Fraser wrote:
> > I don''t have any objection to doing opportunistic PSE for the
former
> > case.  But telling the two apart is rather hard.
> 
> Support for PV guests would be explicit. Hugetlbfs would know whether it
had
> superpages or not, and can fail in the latter case.
Yep, I''d be assuming that.  It''s the FV case where
it''s harder to see
how we communicate this information back to the guest.

--Stephen



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2007-Nov-20 17:44 UTC

head link

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

On 20/11/07 17:40, "Stephen C. Tweedie" <sct@redhat.com> wrote:
>>> I don''t have any objection to doing opportunistic PSE for
the former
>>> case.  But telling the two apart is rather hard.
>> 
>> Support for PV guests would be explicit. Hugetlbfs would know whether
it had
>> superpages or not, and can fail in the latter case.
> 
> Yep, I''d be assuming that.  It''s the FV case where
it''s harder to see
> how we communicate this information back to the guest.
Without PV''ing up the guest I don''t see how it is possible. If
we advertise
support for long mode, then that must imply support for PSE.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Stephen C. Tweedie

2007-Nov-26 17:26 UTC

head link

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Hi,

On Tue, 2007-11-20 at 17:44 +0000, Keir Fraser wrote:
> > Yep, I''d be assuming that.  It''s the FV case where
it''s harder to see
> > how we communicate this information back to the guest.
> 
> Without PV''ing up the guest I don''t see how it is
possible. If we advertise
> support for long mode, then that must imply support for PSE.
Thinking about this --- do we have any way to let the admin easily tell
how many PSE pages a guest has successfully created vs. how many have
been faked?  If an admin can''t rely on bigpages, at least they will
want
to be able to find out when they are working and when they are not.

--Stephen



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Byrne, John (HP Labs)

2007-Nov-29 18:48 UTC

head link

RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Wei,

Sorry for being sluggish getting back to you, but my code was not working and I
lost a week due to networking issues. (I probably could have debugged my code
faster if I''d read your changes more carefully.) I have nothing so
grand as a design; it is a hack to test 2M and 1G super-page performance on a
random page-fault/TLB-miss benchmark. What I was hoping for was to have your
code transparently support 1G pages on the assumption that their performance
would be far better than 2M pages in this extreme case. Unfortunately for me, on
the B1 rev CPU I have, I cannot see any difference between 2M and 1G pages. I
saw something in one document about page splintering when the guest uses smaller
pages than the NPT. Is this the issue? Do NPT super-pages not make any
performance difference if they are larger than the guest pages?

Thanks,

John Byrne

________________________________
From: Huang2, Wei [mailto:Wei.Huang2@amd.com]
Sent: Friday, November 16, 2007 9:54 AM
To: Byrne, John (HP Labs); xen-devel@lists.xensource.com
Cc: Tim Deegan
Subject: RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

John,

If you have a better design, share with us and I will be happy to work with you.
:-) I agree that xc_hvm_build.c does not have to be modified, if memory.c is
smart enough to scan all page_array information. But one concern is that
sometimes Xen tools really want to create mapping at 4KB boundary instead of
using large page. That requires extra information passed from tools (e.g.,
xc_hvm_build.c) to memory.c

-Wei

________________________________
From: Byrne, John (HP Labs) [mailto:john.l.byrne@hp.com]
Sent: Friday, November 16, 2007 11:41 AM
To: Huang2, Wei; xen-devel@lists.xensource.com
Cc: Tim Deegan
Subject: RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Wei,

I have been hacking at this, too,  since I am interested in trying 1GB pages to
see what they can do. After I dug myself into a hole, I restarted from the
beginning and am trying a different approach than modifying xc_hvm_build.c:
modify populate_physmap() to opportunistically allocate large pages, if
possible. I just thought I''d mention it.

John Byrne


________________________________
From: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Huang2, Wei
Sent: Thursday, November 15, 2007 8:26 AM
To: xen-devel@lists.xensource.com
Cc: Tim Deegan
Subject: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

I implemented a preliminary version of HAP large page support. My testings
showed that 32bit PAE and 64bit worked well. Also I saw decent performance
improvement for certain benchmarks.

So before I go too far, I send this patch to community for reviews/comments.
This patch goes with xen-unstable changeset 16281. I will redo it after
collecting all ideas.

Thanks,

-Wei

===========DESIGN IDEAS:
1. Large page requests
- xc_hvm_build.c requests large page (2MB for now) while starting guests
- memory.c handles large page requests. If it can not handle it, falls back to
4KB pages.

2. P2M table
- P2M table takes page size order as a parameter; It builds P2M table (setting
PSE bit, etc.) according to page size.
- Other related functions (such as p2m_audit()) handles the table based on page
size too.
- Page split/merge
** Large page will be split into 4KB page in P2M table if needed. For instance,
if set_p2m_entry() handles 4KB page but finds PSE/PRESENT bits are on, it will
further split large page to 4KB pages.
** There is NO merge from 4KB pages to large page. Since large page is only used
at the very beginning, guest_physmap_add(), this is OK for now.

3. HAP
- To access the PSE bit, L2 pages of P2M table is installed in linear mapping on
SH_LINEAR_PT_VIRT_START. We borrow this address space since it was not used.

4. gfn_to_mfn translation (P2M)
- gfn_to_mfn_foreign() traverses P2M table and handles address translation
correctly based on PSE bit.
- gfn_to_mfn_current() accesses SH_LINEAR_PT_VIRT_START to check PSE bit. If is
on, we handle translation using large page. Otherwise, it falls back to normal
RO_MPT_VIRT_START address space to access P2M L1 pages.

5. M2P translation
- Same as before, M2P translation still happens on 4KB level.

AREAS NEEDS COMMENTS:
1. Large page for 32bit mode
- 32bit use 4MB for large page. This is very annoying for xc_hvm_build.c. I
don''t want to create another 4MB page_array for it.
- Because of this, this area has not been tested very well. I expect changes
soon.

2. Shadow paging
- This implementation will affect shadow mode, especially at xc_hvm_build.c and
memory.c.
- Where and how to avoid affecting shadow?

3. Turn it on/off
- Do we want to turn this feature on/off through option (kernel option or
anything else)?

4. Other missing areas?
==========

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John Byrne

2007-Dec-07 01:43 UTC

head link

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Keir,

I''m very late replying to this. I wanted to make sure I something that
worked first before continuing the discussion and things took longer
than I''d hoped. Wei has asked me to send along my patch (against 16256)
for discussion. (Maybe just to make his look good.) Mine is less 
complete --- it doesn''t handle page shattering when pages are removed 
--- but it works well enough to start Linux HAP guests with 1G 
super-pages, which was my primary interest.

My original thought for modifying just populate_physmap() to
opportunistically use super-pages was that my try_larger_extents()
function in memory.c could be made mode-specific and that the hypervisor
was the easiest place to have this kind of policy. (Will IOMMU DMA
support for PV guests benefit from super-page allocations?)

I did end up modifying xc_hvm_build, because I wanted to optimize the
guest to use 1G pages by using as little memory under 1G as possible.
So, the memsize_low variable I define is meant to become a parameter to
allow the domain config to specify a low memory size (I''m using 32MB
for
now) and the rest of the memory allocated starting at the 1G boundary.
Perhaps some general method of specifying the guest memory layout could
be developed.

For p2m, I assumed that gfn_to_mfn_current() was an infrequent operation 
under HAP and it was not worth doing any direct mapping of the L2/L3
page tables to support this. So gfn_to_mfn_current() in HAP mode just
calls gfn_to_mfn_foreign() (modified to note PSE pages) and walks the
HAP pagetable.

Perhaps there is a useful idea in this that could be used with Wei''s 
changes.

John Byrne


Keir Fraser wrote:> To my mind populate_physmap() should do what it is told w.r.t. extent
sizes. I don''t mind some modification of xc_hvm_build to support this
feature.
> 
>  -- Keir
> 
> On 16/11/07 17:53, "Huang2, Wei" <Wei.Huang2@amd.com>
wrote:
> 
> John,
> 
> If you have a better design, share with us and I will be happy to work with
you. :-) I agree that xc_hvm_build.c does not have to be modified, if memory.c
is smart enough to scan all page_array information. But one concern is that
sometimes Xen tools really want to create mapping at 4KB boundary instead of
using large page. That requires extra information passed from tools (e.g.,
xc_hvm_build.c) to memory.c
> 
> -Wei
> 
> ________________________________
> From: Byrne, John (HP Labs) [mailto:john.l.byrne@hp.com]
> Sent: Friday, November 16, 2007 11:41 AM
> To: Huang2, Wei; xen-devel@lists.xensource.com
> Cc: Tim Deegan
> Subject: RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP
> 
> Wei,
> 
> I have been hacking at this, too,  since I am interested in trying 1GB
pages to see what they can do. After I dug myself into a hole, I restarted from
the beginning and am trying a different approach than modifying xc_hvm_build.c:
modify populate_physmap() to opportunistically allocate large pages, if
possible. I just thought I''d mention it.
> 
> John Byrne
> 
> 
> ________________________________
> From: xen-devel-bounces@lists.xensource.com
[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Huang2, Wei
> Sent: Thursday, November 15, 2007 8:26 AM
> To: xen-devel@lists.xensource.com
> Cc: Tim Deegan
> Subject: [Xen-devel] [RFC][PATCH]Large Page Support for HAP
> 
> I implemented a preliminary version of HAP large page support. My testings
showed that 32bit PAE and 64bit worked well. Also I saw decent performance
improvement for certain benchmarks.
> 
> So before I go too far, I send this patch to community for
reviews/comments. This patch goes with xen-unstable changeset 16281. I will redo
it after collecting all ideas.
> 
> Thanks,
> 
> -Wei
> 
> ===========> DESIGN IDEAS:
> 1. Large page requests
> - xc_hvm_build.c requests large page (2MB for now) while starting guests
> - memory.c handles large page requests. If it can not handle it, falls back
to 4KB pages.
> 
> 2. P2M table
> - P2M table takes page size order as a parameter; It builds P2M table
(setting PSE bit, etc.) according to page size.
> - Other related functions (such as p2m_audit()) handles the table based on
page size too.
> - Page split/merge
> ** Large page will be split into 4KB page in P2M table if needed. For
instance, if set_p2m_entry() handles 4KB page but finds PSE/PRESENT bits are on,
it will further split large page to 4KB pages.
> ** There is NO merge from 4KB pages to large page. Since large page is only
used at the very beginning, guest_physmap_add(), this is OK for now.
> 
> 3. HAP
> - To access the PSE bit, L2 pages of P2M table is installed in linear
mapping on SH_LINEAR_PT_VIRT_START. We borrow this address space since it was
not used.
> 
> 4. gfn_to_mfn translation (P2M)
> - gfn_to_mfn_foreign() traverses P2M table and handles address translation
correctly based on PSE bit.
> - gfn_to_mfn_current() accesses SH_LINEAR_PT_VIRT_START to check PSE bit.
If is on, we handle translation using large page. Otherwise, it falls back to
normal RO_MPT_VIRT_START address space to access P2M L1 pages.
> 
> 5. M2P translation
> - Same as before, M2P translation still happens on 4KB level.
> 
> AREAS NEEDS COMMENTS:
> 1. Large page for 32bit mode
> - 32bit use 4MB for large page. This is very annoying for xc_hvm_build.c. I
don''t want to create another 4MB page_array for it.
> - Because of this, this area has not been tested very well. I expect
changes soon.
> 
> 2. Shadow paging
> - This implementation will affect shadow mode, especially at xc_hvm_build.c
and memory.c.
> - Where and how to avoid affecting shadow?
> 
> 3. Turn it on/off
> - Do we want to turn this feature on/off through option (kernel option or
anything else)?
> 
> 4. Other missing areas?
> ==========> 
> ________________________________
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 
> 



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Nov 2007 - [RFC][PATCH]Large Page Support for HAP

[Xen-devel] [RFC][PATCH]Large Page Support for HAP

[Xen-devel] Re: [RFC][PATCH]Large Page Support for HAP

RE: [Xen-devel] Re: [RFC][PATCH]Large Page Support for HAP

Re: [Xen-devel] Re: [RFC][PATCH]Large Page Support for HAP

[Xen-devel] RE: [RFC][PATCH]Large Page Support for HAP

RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

RE: [Xen-devel] [RFC][PATCH]Large Page Support for HAP

Re: [Xen-devel] [RFC][PATCH]Large Page Support for HAP