thr3ads.net - Xen devel - [Xen-devel] understanding __linear_l2

If this information is useful, please help other people find it:
Share via:

Scott Parish

2005-Apr-19 23:03 UTC

[Xen-devel] understanding __linear_l2_table and friends

I was trying to understand the states behind domain creation, but i''m
having troubles getting past this. Would someone mind saying a few
words about what these are and (if still needed) why these calculations
work for that?

xen/include/asm-x86/page.h:
   #define linear_l1_table                                                 \
       ((l1_pgentry_t *)(LINEAR_PT_VIRT_START))
   #define __linear_l2_table                                                 \
       ((l2_pgentry_t *)(LINEAR_PT_VIRT_START +                            \
                        (LINEAR_PT_VIRT_START >>
(PAGETABLE_ORDER<<0))))
   #define __linear_l3_table                                                 \
       ((l3_pgentry_t *)(LINEAR_PT_VIRT_START +                            \
                        (LINEAR_PT_VIRT_START >>
(PAGETABLE_ORDER<<0)) +   \
                        (LINEAR_PT_VIRT_START >>
(PAGETABLE_ORDER<<1))))
   #define __linear_l4_table                                                 \
       ((l4_pgentry_t *)(LINEAR_PT_VIRT_START +                            \
                        (LINEAR_PT_VIRT_START >>
(PAGETABLE_ORDER<<0)) +   \
                        (LINEAR_PT_VIRT_START >>
(PAGETABLE_ORDER<<1)) +   \
                        (LINEAR_PT_VIRT_START >>
(PAGETABLE_ORDER<<2))))
   
Thanks!
sRp

-- 
Scott Parish

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Apr-20 10:05 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

They aren''t actually used during domain building, but anyway: Xen uses
the common trick whereby each page directory maps itself. This means
that every page-table entry is mapped into the address space at some
virtual address. In fact, page directory entries (and PML3 and PML4
entries on x86/64) are also directly accessible in the virtual address
space. The macros below are expressions that evaluate to the correct
virtual addresses.

 -- Keir
> I was trying to understand the states behind domain creation, but
i''m
> having troubles getting past this. Would someone mind saying a few
> words about what these are and (if still needed) why these calculations
> work for that?
> 
> xen/include/asm-x86/page.h:
>    #define linear_l1_table                                                
\
>        ((l1_pgentry_t *)(LINEAR_PT_VIRT_START))
>    #define __linear_l2_table                                               
\
>        ((l2_pgentry_t *)(LINEAR_PT_VIRT_START +                           
\
>                         (LINEAR_PT_VIRT_START >>
(PAGETABLE_ORDER<<0))))
>    #define __linear_l3_table                                               
\
>        ((l3_pgentry_t *)(LINEAR_PT_VIRT_START +                           
\
>                         (LINEAR_PT_VIRT_START >>
(PAGETABLE_ORDER<<0)) +   \
>                         (LINEAR_PT_VIRT_START >>
(PAGETABLE_ORDER<<1))))
>    #define __linear_l4_table                                               
\
>        ((l4_pgentry_t *)(LINEAR_PT_VIRT_START +                           
\
>                         (LINEAR_PT_VIRT_START >>
(PAGETABLE_ORDER<<0)) +   \
>                         (LINEAR_PT_VIRT_START >>
(PAGETABLE_ORDER<<1)) +   \
>                         (LINEAR_PT_VIRT_START >>
(PAGETABLE_ORDER<<2))))
>    
> Thanks!
> sRp
> 
> -- 
> Scott Parish
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gerd Knorr

2005-Apr-20 16:06 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

Keir Fraser <Keir.Fraser@cl.cam.ac.uk> writes:
> They aren''t actually used during domain building,
Used anywhere else?  Especially __linear_l2_table and
__linear_l3_table?
> Xen uses the common trick whereby each page directory maps
> itself. This means that every page-table entry is mapped into the
> address space at some virtual address.
Well, in PAE mode that trick doesn''t fully work.  It will do fine for
the l1 tables, I think also for l2, but certainly not for l3 due to
address space constrains ...

  Gerd

-- 
#define printk(args...) fprintf(stderr, ## args)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Apr-20 16:25 UTC

head link

RE: [Xen-devel] understanding __linear_l2_table and friends

> > Xen uses the common trick whereby each page directory maps itself. 
> > This means that every page-table entry is mapped into the address 
> > space at some virtual address.
> 
> Well, in PAE mode that trick doesn''t fully work.  It will do 
> fine for the l1 tables, I think also for l2, but certainly 
> not for l3 due to address space constrains ...
???

The linear tables for PAE will consume 8MB of VA space, and all the
current processes''s L1, L2 and L3 pages will all be contained within
the
linear table.

You can use the linear table to update any PTE in the domain''s currrent
address space by virtual address.

Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2005-Apr-20 16:31 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

On 20 Apr 2005, at 17:25, Ian Pratt wrote:
>> Well, in PAE mode that trick doesn''t fully work.  It will do
>> fine for the l1 tables, I think also for l2, but certainly
>> not for l3 due to address space constrains ...
>
> ???
>
> The linear tables for PAE will consume 8MB of VA space, and all the
> current processes''s L1, L2 and L3 pages will all be contained
within
> the
> linear table.
>
> You can use the linear table to update any PTE in the domain''s
currrent
> address space by virtual address.
Gerd is correct that it does not fully work for PAE, but not simply 
because of address-space considerations. The top-level page directory 
in PAE is not the same format as the lower levels (it contains 4 
entries rather than 512), so the trick of it mapping itself doesn;t 
work.

We don''t currently use linear mapping for anything other than L1 
entries anyway, except maybe in shadow code, and we can fix it up by 
other means (separately map top-level page dir).

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Apr-20 18:53 UTC

head link

RE: [Xen-devel] understanding __linear_l2_table and friends

> Gerd is correct that it does not fully work for PAE, but not 
> simply because of address-space considerations. The top-level 
> page directory in PAE is not the same format as the lower 
> levels (it contains 4 entries rather than 512), so the trick 
> of it mapping itself doesn;t work.
It works at the expense of burning an extra 2MB of VA space in an L2...
 
We have to take 4 slots in the L2 handling the top of the VA space, and
have the four slots point at the 4 L2s. We can use this to access all
the L1''s and L2''s.

We then take another slot in the uppermost L2 and have it point at the
L3.

Puke. PAE is utterly disgusting. 

Ian



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gerd Knorr

2005-Apr-20 19:14 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

On Wed, Apr 20, 2005 at 07:53:00PM +0100, Ian Pratt
wrote:> > Gerd is correct that it does not fully work for PAE, but not 
> > simply because of address-space considerations.
Well, sort of.  The trick requires that the linear page table address
space is aligned to what the topmost page table level can handle.  And
it eats one entry.  We would have to align the linear page table @ 3GB
and waste 1GB address space, then the self-referencing trick would work
even with the 3rd level I think.  Obviously not an option ;)
> We have to take 4 slots in the L2 handling the top of the VA space, and
> have the four slots point at the 4 L2s. We can use this to access all
> the L1''s and L2''s.
That''s exactly what I''m doing at the moment.
> We then take another slot in the uppermost L2 and have it point at the
> L3.
That I don''t ;)

While I''m at it:  Which levels writable pagetables are used for
(without shadowing)?  Only the first?  Or also the other ones?

  Gerd

-- 
#define printk(args...) fprintf(stderr, ## args)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Scott Parish

2005-Apr-20 19:46 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

On Wed, Apr 20, 2005 at 11:05:02AM +0100, Keir Fraser wrote:
> Xen uses the common trick whereby each page directory maps
> itself. This means that every page-table entry is mapped into the
> address space at some virtual address.
So this is the same as netbsd''s recursive page table stuff.

Thanks for the explanation
sRp

-- 
Scott Parish

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Apr-20 20:27 UTC

head link

RE: [Xen-devel] understanding __linear_l2_table and friends

> > We have to take 4 slots in the L2 handling the top of the VA space, 
> > and have the four slots point at the 4 L2s. We can use this 
> to access 
> > all the L1''s and L2''s.
> 
> That''s exactly what I''m doing at the moment.
> 
> > We then take another slot in the uppermost L2 and have it 
> point at the 
> > L3.
> 
> That I don''t ;)
There are three possible soloutions for L3 accesses : 
 * wrap them in map_domain_mem. This will be very slow
 * burn 2MB of VA space in an L2 to map the L3
 * insist on every pagetable having a reserved L1 in which we can steal
a 4KB slot

Both 2 and 3 are plausible, though 3 might waste a little physical
memory unless we arranged such that the kernel could made use of the
remaining slots. Having a per-pagetable L2 with reserved slots is going
to be a pain enough anyhow.
> While I''m at it:  Which levels writable pagetables are used 
> for (without shadowing)?  Only the first?  Or also the other ones?
We currently just use them for L1''s, as you typically don''t
see many
batch updates to L2s (at least relatively speaking). We currently use
mmu_update hypercalls for L2 updates, though it probably wouldn''t be
much slower if we just used the instruction emulation path. Since its
all hidden in the setpgd macro its not a big deal either way...

In the first instance, it probably makes sense to get PAE working using
hypercalls everywhere, and then debug the emulation path, and finally
enable full writeable pagetables.

Cheers,
Ian




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gerd Knorr

2005-Apr-20 21:38 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

> There are three possible soloutions for L3 accesses : 
>  * wrap them in map_domain_mem. This will be very slow
>  * burn 2MB of VA space in an L2 to map the L3
>  * insist on every pagetable having a reserved L1 in which we can steal
> a 4KB slot
According to Keir linear tables are used for L1 access only anyway, so
this probably isn''t an issue.   Beside that I''d probably go
with (1).
l3 in PAE mode is just 4 entries, so access to them very likely is rare,
thus I''d rather take small the map/unmap performance hit than trying to
implement complicated things like (3) which could have unexpected side
effects all over the place in the paging code.
> In the first instance, it probably makes sense to get PAE working using
> hypercalls everywhere, and then debug the emulation path, and finally
> enable full writeable pagetables.
I''m not that far yet ...

How does the console output of domain 0 work?  Is it passed to xen via
hypercall?  Or does domain 0 manage it itself (very early in boot)?

How far goes the boot of the xenolinux kernel in domain 0 with the
initial pagetable setup created by xen''s dom0 builder?  I think
I should see some kernel messages from linux before it actually
touches the page tables?

Current state is that xen itself comes up fine, the domain 0 builder
completes, but the xenlinux kernel is killed via domain_crash() very
early, before the first message appears on the screen, and I''m trying
to figure what is going on ...

  Gerd

-- 
#define printk(args...) fprintf(stderr, ## args)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Apr-20 22:10 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

> > There are three possible soloutions for L3 accesses : 
> >  * wrap them in map_domain_mem. This will be very slow
> >  * burn 2MB of VA space in an L2 to map the L3
> >  * insist on every pagetable having a reserved L1 in which we can
steal
> > a 4KB slot
> 
> According to Keir linear tables are used for L1 access only anyway, so
> this probably isn''t an issue.   Beside that I''d probably
go with (1).
> l3 in PAE mode is just 4 entries, so access to them very likely is rare,
> thus I''d rather take small the map/unmap performance hit than
trying to
> implement complicated things like (3) which could have unexpected side
> effects all over the place in the paging code.
That''ll be OK to get paravirt mode working, but the shadow modes do
do a fair number of accesses to L2(L3) pages via linear mappings.

Scheme #1 will do for starters, though. Scheme #2 is easy too, but
we have to be careful how much lowmem we burn.
 > > In the first instance, it probably makes sense to get PAE working
using
> > hypercalls everywhere, and then debug the emulation path, and finally
> > enable full writeable pagetables.
> 
> I''m not that far yet ...
> 
> How does the console output of domain 0 work?  Is it passed to xen via
> hypercall?  Or does domain 0 manage it itself (very early in boot)?
It goes via a hypercall. To get early printk, just hack the
following into the obvious place in kernel/printk.c after vscnprintf:

HYPERVISOR_console_io(CONSOLEIO_write,  sizeof(printk_buf), printk_buf);
> How far goes the boot of the xenolinux kernel in domain 0 with the
> initial pagetable setup created by xen''s dom0 builder?  I think
> I should see some kernel messages from linux before it actually
> touches the page tables?
With the above hack, yes.

Cheers,
Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Apr-21 13:51 UTC

head link

RE: [Xen-devel] understanding __linear_l2_table and friends

One key design decision with PAE para-virtualized guests is how to
handle the per-pagetable (as opposed to per-domain) mappings that exist
in the hypervisor reserved area. The only ones of these that spring to
mind are in fact the linear pagetable mappings.

PAE Linux currently uses a single L2 for all kernel mappings shared
across all pagetables. Thus, when we do the mmu_ext_op hypercall to
switch cr3 we''d need to write in new values into the appropriate L2 of
the destination pagetable before re-loading cr3 (since in reality
there''ll only really ever be one such L2 for the domain, it makes sense
to leave an open map_domain_mem to it.)

The downside of this scheme is that it will cripple the TLB flush filter
on Opteron. Linux used to do this until 2.6.11 anyhow, and no-one really
complained much. The far bigger problem is that it won''t work for SMP
guests, at least without making the L2 per VCPU and updating the L3
accordingly using mm ref counting, which would be messy but do-able.

The alternative is to hack PAE Linux to force the L2 containing kernel
mappings to be per-pagetable rather than shared. The downside of the is
that we use an extra 4KB per pagetable, and have the hassle of faulting
in kernel L2 mappings on demand (like non-PAE Linux has to). This plays
nicely with the TLB flush filter, and is fine for SMP guests. 

The simplest thing of all in the first instance is to turn all of the
linear pagetable accesses into macros taking (exec_domain, offset) and
then just implement them using pagetable walks.

What do you guys think? Implement option #3 in the first instance, then
aim for #2.

One completely different approach would be to first implement a PAE
guest using the "translate, internal" shadow mode where we
don''t have to
worry about any of this gory  stuff. Once its working, we could then
implement a paravirtualized mode to improve performance and save memory.
Getting shadow mode working on PAE shouldn''t be too hard, as its been
written with 2, 3 and 4 level pagetables in mind.

The shadow mode approach could be implemented in parallel with the
paravirt approach. We could even turn it into a race to the first
multiuser boot :-)

Cheers,
Ian


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gerd Knorr

2005-Apr-21 19:42 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

> The alternative is to hack PAE Linux to force the L2 containing kernel
> mappings to be per-pagetable rather than shared. The downside of the is
> that we use an extra 4KB per pagetable, and have the hassle of faulting
> in kernel L2 mappings on demand (like non-PAE Linux has to). This plays
> nicely with the TLB flush filter, and is fine for SMP guests. 
I think that one is better.  The topmost L2 table with the kernel
mappings is a special case anyway because it also has the hypervisor
hole and thus differs from the other three L2 tables when it comes to
allocation and verification (and maybe other places as well).

I''m considering adding a new page type for the topmost L2 in PAE mode
to handle this.  Comments?  Better ideas?

  Gerd

-- 
#define printk(args...) fprintf(stderr, ## args)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2005-Apr-21 21:13 UTC

head link

RE: [Xen-devel] understanding __linear_l2_table and friends

> > The alternative is to hack PAE Linux to force the L2 
> containing kernel 
> > mappings to be per-pagetable rather than shared. The 
> downside of the 
> > is that we use an extra 4KB per pagetable, and have the hassle of 
> > faulting in kernel L2 mappings on demand (like non-PAE 
> Linux has to). 
> > This plays nicely with the TLB flush filter, and is fine 
> for SMP guests.
> 
> I think that one is better. 
Good. The only hassle is the need for Linux''s demand filling of L2
slots
pointing to kernel L1''s, but seeing as non-PAE Linux has similar code
already, this shouldn''t be too hard.
>  The topmost L2 table with the 
> kernel mappings is a special case anyway because it also has 
> the hypervisor hole and thus differs from the other three L2 
> tables when it comes to allocation and verification (and 
> maybe other places as well).
> I''m considering adding a new page type for the topmost L2 in 
> PAE mode to handle this.  Comments?  Better ideas?
You can just maintain the va back ptr index for L2''s as well as
L1''s (we
may want to do this anyway to implement writeable L2 pagetables at some
point). If the va back ptr == 3, you know its an L2 with hypervisor
slots.

Part of validating an L3 will be to check that the top slot is filled in
and pointing to a validated L2. When alloc_l2_table is called with a
back pointer index of 3 it will install hypervisor entries in the L2.

I think this is much neater.

Best,
Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andi Kleen

2005-Apr-22 11:04 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

On Thu, Apr 21, 2005 at 02:51:34PM +0100, Ian Pratt
wrote:> PAE Linux currently uses a single L2 for all kernel mappings shared
> across all pagetables. Thus, when we do the mmu_ext_op hypercall to
> switch cr3 we''d need to write in new values into the appropriate
L2 of
> the destination pagetable before re-loading cr3 (since in reality
> there''ll only really ever be one such L2 for the domain, it makes
sense
> to leave an open map_domain_mem to it.)
> 
> The downside of this scheme is that it will cripple the TLB flush filter
> on Opteron. Linux used to do this until 2.6.11 anyhow, and no-one really
It also cripples the "adaptive cache" on
Intel systems, which assume that if two HT siblings have the same CR3 
then the L1 cache can be shared. If that is false you get L1 cache
thrashing in some HT workloads.
> complained much. The far bigger problem is that it won''t work for
SMP
> guests, at least without making the L2 per VCPU and updating the L3
> accordingly using mm ref counting, which would be messy but do-able.
> 
> The alternative is to hack PAE Linux to force the L2 containing kernel
> mappings to be per-pagetable rather than shared. The downside of the is
> that we use an extra 4KB per pagetable, and have the hassle of faulting
> in kernel L2 mappings on demand (like non-PAE Linux has to). This plays
> nicely with the TLB flush filter, and is fine for SMP guests. 
> 
> The simplest thing of all in the first instance is to turn all of the
> linear pagetable accesses into macros taking (exec_domain, offset) and
> then just implement them using pagetable walks.
> 
> What do you guys think? Implement option #3 in the first instance, then
> aim for #2.
Since PAE is a temporary crock I would chose the least intrusive 
variant to the codebase :)

-Andi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Kip Macy

2005-Apr-22 20:47 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

> 
> Since PAE is a temporary crock I would chose the least intrusive
> variant to the codebase :)
> A temporary crock that is likely to be 80% of Xen''s deployments for
the next couple of years.

       -Kip

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andi Kleen

2005-Apr-23 15:08 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

On Fri, Apr 22, 2005 at 01:47:34PM -0700, Kip Macy
wrote:> > 
> > Since PAE is a temporary crock I would chose the least intrusive
> > variant to the codebase :)
> > 
> A temporary crock that is likely to be 80% of Xen''s deployments
for
> the next couple of years.
Very unlikely, since you will have a hard time to buy non X86-64
capable servers in the next couple of years. It is already pretty
hard with new boxes. Even desktops are becomming more and more
64bit capable (Intel will even enable it on all Celerons a bit
later this year). 

The only 32bit holdout left are the very lowend boxes
from AMD and Intel Laptops and VIA.  And these generally dont need
any PAE since dont support enough RAM (assuming you dont need the NX hype)

That is why the PAE effort seems so pointless to me. I estimate it will
take some months at least until it is stable and released, and at this time 
most of the new x86 world is x86-64 capable.

The only boxes for which PAE is needed are basically some old servers,
and these will be quickly replaced with new 64bit capable ones.

-Andi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Wim Coekaerts

2005-Apr-23 15:13 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

On Sat, Apr 23, 2005 at 05:08:27PM +0200, Andi Kleen
wrote:> That is why the PAE effort seems so pointless to me. I estimate it will
> take some months at least until it is stable and released, and at this time
> most of the new x86 world is x86-64 capable.
> 
> The only boxes for which PAE is needed are basically some old servers,
> and these will be quickly replaced with new 64bit capable ones.

sorry andi I disagree
"some" is incorrect. there are huge huge numbers of servers outthere,
you don''t just replace them. many potential xen users probably have
100s
of relatively recent x86 servers around.

one doesn''t just replace servers. maybe at home, but not companies.
if you have a server farm with 4000 systems, you don''t just toss it.
I think it''s worth the effort

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andi Kleen

2005-Apr-23 15:20 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends II

Thinking about this a bit more:

On Thu, Apr 21, 2005 at 02:51:34PM +0100, Ian Pratt
wrote:> The downside of this scheme is that it will cripple the TLB flush filter
> on Opteron. Linux used to do this until 2.6.11 anyhow, and no-one really
> complained much. The far bigger problem is that it won''t work for
SMP
> guests, at least without making the L2 per VCPU and updating the L3
> accordingly using mm ref counting, which would be messy but do-able.
> 
> The alternative is to hack PAE Linux to force the L2 containing kernel
> mappings to be per-pagetable rather than shared. The downside of the is
> that we use an extra 4KB per pagetable, and have the hassle of faulting
> in kernel L2 mappings on demand (like non-PAE Linux has to). This plays
> nicely with the TLB flush filter, and is fine for SMP guests. 
<without having looked at the Xen code much, but some familiarity with
the i386 linux code>

I thought about this a bit more and your section alternative sounds
much better. Faulting on the kernel mappings is very infrequent
and usually after some time the PGD is fully set up and only the lower
level of the kernel mappings change with vmalloc etc.. On x86-64 Linux
I even initialize it when the PGD is created from a static template
page. The remaining cases for very big vmalloc can be handled on demand
without too much code. It should be pretty easy  to do on i386 too.

> 
> The simplest thing of all in the first instance is to turn all of the
> linear pagetable accesses into macros taking (exec_domain, offset) and
> then just implement them using pagetable walks.
> 
> What do you guys think? Implement option #3 in the first instance, then
> aim for #2.
I dont get your numbering, didnt you have only two options?
Or does the one below count too?
> 
> One completely different approach would be to first implement a PAE
> guest using the "translate, internal" shadow mode where we
don''t have to
> worry about any of this gory  stuff. Once its working, we could then
> implement a paravirtualized mode to improve performance and save memory.
> Getting shadow mode working on PAE shouldn''t be too hard, as its
been
> written with 2, 3 and 4 level pagetables in mind.
That sounds attractive too, except that duplicated page tables
can be killer on some workloads (database with many processes and
lots of shared memory, you end up with a lot of memory tied 
in page tables even with hugetlb). And normally databases are one of the most
common workloads for PAE. It might be a good idea to avoid it
at least for the para case.

-Andi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Andi Kleen

2005-Apr-23 15:28 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

On Sat, Apr 23, 2005 at 08:13:08AM -0700, Wim Coekaerts
wrote:> On Sat, Apr 23, 2005 at 05:08:27PM +0200, Andi Kleen wrote:
> > That is why the PAE effort seems so pointless to me. I estimate it
will
> > take some months at least until it is stable and released, and at this
time
> > most of the new x86 world is x86-64 capable.
> > 
> > The only boxes for which PAE is needed are basically some old servers,
> > and these will be quickly replaced with new 64bit capable ones.
> 
> 
> sorry andi I disagree
> "some" is incorrect. there are huge huge numbers of servers
outthere,
> you don''t just replace them. many potential xen users probably
have 100s
> of relatively recent x86 servers around.
> 
> one doesn''t just replace servers. maybe at home, but not
companies.
> if you have a server farm with 4000 systems, you don''t just toss
it.
> I think it''s worth the effort
You toss it after 3-4 years at least. Lets say 3 years.

If you bought them in the last year you very likely already got them 64bit 
capable. Assuming it takes a year until PAE Xen is usable. They are at least 
two years old when PAE Xen runs on them. Gives 1 years of usable runtime. 

Not too much.

My impression is more that people want PAE Xen because 64bit Xen is not
quite ready yet, but I would not be surprised if 64bit Xen works
sooner than PAE Xen and then that would be obsolete. In general
from my experience working on PAE Linux I can say that the complexity
of handling more than 4GB RAM with less than 4GB address space
is often greatly underestimated. Linux took years before the many corner
cases were flushed out, and now it is somewhat fragile. Of course
Xen is simpler than Linux, but in many ways it has much less
infrastructure to deal with memory pressure so I would not be 
surprised if some stuff would be harder to handle. So the 1 year
estimate for it running well might be optimistic.

Making 64bit Xen run well is probably easier, even if it needs
more changes and some hacks now.

-Andi

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gerd Knorr

2005-Apr-24 19:55 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

On Sat, Apr 23, 2005 at 05:28:26PM +0200, Andi Kleen
wrote:> If you bought them in the last year you very likely already got them 64bit 
> capable.
That the machines are 64bit capable doesn''t mean that people will
actually run 64bit software on them.  Note that the very good backward
compatibility of x86_64 machines to 32bit software is one of the key
features leading to the success of the processors (lesson learned from
ia64 ;)

Not everyone will instantly switch over to 64bit software just because
the processor is able to do so, there are still way to much issues with
64bit Software.  Linux is way ahead compared to most other operating
systems, and still there are plenty of problems: OpenOffice is still
32bit, Firefox runs in 32bit much more stable than in 64bit, to name
just two prominent examples.  And with non-mainstream software it is
even more likely to trap into not-yet fixed 64bit bugs.

Nevertheless I don''t expect 80% of the installations being PAE, thats
too much.  People will start using 64bit software, but I''m sure not
everybody will not instantly switch over to 64bit just because the
hardware can do that.  If it''s only to reduce the maintainance work
in a data center with both 32 and 64bit capable machines ...
> In general from my experience working on PAE Linux I can say that the
> complexity of handling more than 4GB RAM with less than 4GB address
> space is often greatly underestimated.
> Of course Xen is simpler than Linux, but in many ways it has much less
> infrastructure to deal with memory pressure so I would not be
> surprised if some stuff would be harder to handle.
Well, after looking into Xen''s mm code I''d say this is no
problem for
Xen.  Xen basically delegates all that work to the guest operating
system, it simply doesn''t has to deal with memory pressure issues.
> So the 1 year estimate for it running well might be optimistic.
I''d say it is pessimistic, but let''s see ...

At the moment my pae xenlinux kernel doesn''t survice paging_init() yet.
It seems to me that this piece of code already triggers almost
everything which must be touched for PAE support in xenlinux and xen
though, so I expect a dom0 multi-user boot isn''t that far away once
paging_init() works fine ;)

  Gerd

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

David Hopwood

2005-Apr-25 00:41 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

Gerd Knorr wrote:> On Sat, Apr 23, 2005 at 05:28:26PM +0200, Andi Kleen wrote:
> 
>>If you bought them in the last year you very likely already got them
64bit
>>capable.
> 
> That the machines are 64bit capable doesn''t mean that people will
> actually run 64bit software on them.  Note that the very good backward
> compatibility of x86_64 machines to 32bit software is one of the key
> features leading to the success of the processors (lesson learned from
> ia64 ;)
What does that have to do with PAE support in Xen? x86_64 machines
do not support PAE, and do not need it to run 32-bit applications.

(A good decision by AMD, IMHO. The complexity of supporting PAE along with
all the other mode combinations would have been ridiculous.)

-- 
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Williamson

2005-Apr-25 00:46 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

> What does that have to do with PAE support in Xen? x86_64 machines
> do not support PAE, and do not need it to run 32-bit applications.
OK but if you don''t use the 64-bit mode at all there''s nothing
to stop you
booting in vanilla PAE mode.  Owners of x86_64 boxes may then choose to use 
PAE to run a basically 32-bit system but still access all their RAM.

Cheers,
Mark

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

David Hopwood

2005-Apr-25 02:53 UTC

head link

Re: [Xen-devel] understanding __linear_l2_table and friends

Mark Williamson wrote:>>What does that have to do with PAE support in Xen? x86_64 machines
>>do not support PAE, and do not need it to run 32-bit applications.
> 
> OK but if you don''t use the 64-bit mode at all there''s
nothing to stop you
> booting in vanilla PAE mode.
Oh, you''re right. I had somehow got the impression that AMD64 boxes
didn''t support PAE in "legacy mode" either, but I see that I
was mistaken
(section 5 of volume 2 of the arch manual).

-- 
David Hopwood <david.nospam.hopwood@blueyonder.co.uk>


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Apr 2005 - understanding __linear_l2_table and friends

[Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends

RE: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends

RE: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends

RE: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends

RE: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends

RE: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends II

Re: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends

Re: [Xen-devel] understanding __linear_l2_table and friends