thr3ads.net - Xen devel - [Xen-devel] Essay on an important Xen decision (long) [Jan 2006]

If this information is useful, please help other people find it:
Share via:

Magenheimer, Dan (HP Labs Fort Collins)

2006-Jan-10 19:26 UTC

[Xen-devel] Essay on an important Xen decision (long)

A fundamental architectural decision has to be made for
Xen regarding handling of physical/machine memory; at a high
level, the question is:

	Should Xen drivers be made more flexible to accommodate
	different approaches to managing physical memory, or
	should other architectures be required to conform to
	the Xen/x86 model?

A more detailed description of the specific decision is below.
The Xen/ia64 community would like to make this decision soon --
possibly at the Xen summit -- as next steps of Xen/ia64
functionality are significantly affected.  Since either choice
has an impact on common code and on future Xen architecture,
this decision must involve core Xen developers and the broader
Xen community rather than just Xen/ia64 developers.

While this may seem to be a trivial matter, such fundamental
choices often have a way of pre-selecting future design and
implementation directions that can have major negative or positive
impacts -- possibly unexpected -- on different parties.  For example,
a decision might make a Xen developers'' life easier but create
headaches for a distro or a Linux maintainer.  If nothing else,
discussing fundamental decision points often helps to
bring out and codify/document hidden assumptions about
the future.

This is a lengthy document but I hope to touch on most of
the various issues and tradeoffs.  Understanding -- or, at
a minimum, reading -- this document should probably be
a prerequisite for involvement in discussions to resolve this.
I would encourage all readers to give the issues and tradeoffs
some thought as the "obvious x86" answer may not be the best
answer for the future of Xen.

First a little terminology and background:

In a virtualized environment, the resources of the physical
machine must subdivided and/or shared between multiple virtual
machines.  Like an OS manages memory for its applications, one of
the primary roles of a hypervisor is to provide the illusion to
each guest OS that it owns some amount of "RAM" in the system.
Thus there are two kinds of physical memory addresses: the
addresses that a guest believes to be physical addresses and
the addresses that actually refer to RAM (e.g. bus addresses).
The literature (and Xen) confusingly labels these as "physical"
addresses and "machine" addresses.  In a virtualized environment,
there must be some way of maintaining the relationship -- or
"mapping" -- between physical addresses and machine addresses.

In Xen (across all architectures), there are currently three
different approaches for mapping physical addresses to machine
addresses:

1) P==M: The guest is given a subset of machine memory that it
   can access "directly".  Accesses to machine memory addresses
   outside of this range must somehow be restricted (but not
   necessarily disallowed) by Xen.

2) guest-aware p!=m (P2M): The guest is given max_pages of
   contiguous physical memory starting at zero and the knowledge
   that physical addresses are different than machine addresses.
   The guest must understand the difference between a physical
   address and a machine address and utilize the correct one in
   different situations.

3) virtual physical (VP): The guest is given max_pages of
   contiguous physical memory starting at zero.  Xen provides
   the illusion to the guest that this is machine memory;
   any physical-to-machine translation required for functional
   correctness is handled invisibly by Xen.  VP cannot be used
   by guests that directly program DMA-based I/O devices
   because a DMA device requires a machine address and, by
   definition, the guest knows only about physical addresses.

Xen/x86 and Xen/x86_64 use P2M, but switch to VP (aka "shadow
mode") for an unprivileged guest when a migration is underway.
Xen/ia64 currently uses P==M for domain0 and VP for unprivileged
guests.  Xen/ppc intends to use VP only.

There is an architectural proposal to change Xen/ia64 so that
domain0 uses P2M instead of P==M.  We will call this choice P2M
and the choice to stay on the current path P==M.

Here''s what I think are the key issues/tradeoffs:

XEN CODE IMPACT

Some Xen drivers, such as the blkif driver, have been "converted"
to accommodate P==M. Others have not.  For example, the balloon driver
currently assumes domain0 is P2M and thus does not currently work
on Xen/ia64 or Xen/ppc.  The word "converted" is quoted because
nobody is particularly satisfied with the current state of the
converted drivers.  Many apparently significant function calls are
define''d out of existence by macros.  Other code does radically
different things depending on the architecture or on whether it
is being executed by dom0 or an unprivileged domain.  And a few
ifdef''s are sprinkled about.  In short, what''s done works but
is
an ugly hack.  Some believe that the best way to solve this mess
is for other architectures to do things more like Xen/x86.  Others
believe there is an advantage to defining clear abstractions and
making the drivers truly more architecture-independent.

P2M will require some rewriting of existing Xen/ia64 core code and the
addition of significant changes to Xenlinux/ia64 code but will allow
much easier porting of Xen''s balloon/networking/migration drivers
and also enable some simplifying changes in the Xen block driver.
It is fair to guess that it will take at least several weeks/months
to rewrite and debug the core and Xenlinux code to get Xen/ia64 back
to where it is today, but future driver work will be much faster.
Fewer differences from Xen/x86 means less maintenance work for Xen
core and Xen/ia64 developers.  I''d imagine also that more code will
be shared between Xen/VT-i and Xen/VT-x.

P==M will require Xen''s balloon/networking/migration drivers to
evolve to incorporate non-P2M models.  This can be done, but is most
likely to end up (at least in the short term) as a collection of
unpalatable hacks like with the Xen block driver.  However, making
Xen drivers more tolerant of different approaches may be a good
thing in the long run for Xen.

XENLINUX IMPACT

Today''s operating systems are not implemented with an understanding
that a physical address and a machine address might be different.
Building this awareness into an OS requires non-trivial source
code change.  For example, Xenlinux/x86 maintains a "p2m" mapping
table for quick translation and provides a "m2p" hypercall to keep
Xen in sync.  OS code that manipulates physical addresses must be
modified to access/manage this table and make hypercalls when
appropriate.  Macros can hide much of the complexity but much OS/driver
code exists that does not use standard macros.  There is some
disagreement on how extensive are the required source code changes,
and how difficult it will be to maintain these changes across future
versions of guest OS''s.  One illustrative example however:  In
paravirtualizing Xenlinux/ia64, seven header files are changed;
it is closer to 40 for Xenlinux/x86.

Related, some would assert that pushing a small number of changes into
Linux (or any OS, open source or not) is far easier that pushing a
large number of changes into Linux.  Until all the Xen/x86 changes are
in, it remains to be seen whether this is true or not.  There is
a reasonable concern that the broad review required for such
an extensive set of changes will involve a large number of people
with a large number of agendas and force a number of Xen design
issues to be revisited -- at least clearly justified if not changed.
This is especially true if Xen''s foes have any influence in the
process.

Transparent paravirtualization (also called "shared binary") is the
ability for the same binary to be used both as a Xen guest and
natively on real hardware.  Xenlinux/ia64 currently support this;
indeed, ignoring a couple of existing bugs, the same Xenlinux/ia64
binary can be used natively, and as domain0 and as an unprivileged
domain. There have been proposals to do the same for Xenlinux/x86,
but the degree of code changed is much much higher.  There is debate
about the cost/benefit of transparent paravirtualization, but the
primary beneficiaries -- distros and end customers -- are not very
well represented here.

With P2M, it is unlikely that Xenlinux/ia64 will ever again be
transparently paravirtualizable.  As with Xenlinux/x86, the changes
will probably be pushed into a subarch (mach-xen).  Since Linux/ia64
has a more diverse set of subarch''s, there may be additional work
to ensure that Xen is orthogonal (and thus works with) all the
subarch''s.

P==M would continue to allow transparent paravirtualization.
This plus the reduced number of changes should make it easier to
get Xen/ia64 support into Linux/ia64 (assuming Xen/x86 support
gets included in Linux/x86).

DRIVER DOMAINS

Driver domains are "coming soon" and support of driver domains is a
"must", however support for hybrid driver domains (i.e. domains that
utilize both backend and frontend drivers) is open to debate.  It can
be assumed however that all driver domains will require DMA access.

P2M should make driver domains easier to implement (once the initial
Xenlinux/ia64 work is completed) and able to support a broader range
of functionality.  P==M may disallow hybrid driver domains and
create other restrictions, though some creative person may be able
to solve these.

FUTURE XEN FEATURE SUPPORT

None of the approaches have been "design-tested" significantly for
support or compatibility with future Xen functionality such as
oversubscription or machine-memory hot-plug, nor for exotic
machine memory topologies such as NUMA or discontig (sparsely
populated).  Such functionalities and topologies are much more
likely to be encountered in high-end server architectures rather
than widely-available PCs and low-end servers.  There is some
debate as to whether the existing Xen memory architecture will easily
evolve to accommodate these future changes or if more fundamental
changes will be required.  Architectural decisions and restrictions
should be made with these uncertainties in mind.

Some believe that discovery and policy for machine memory will
eventually need to move out of Xen into domain0, leaving only
enforcement mechanism in Xen.  For example, oversubscription, NUMA
or hot-plug memory support are likely to be fairly complicated
and a commonly stated goal is to move unnecessary complexity out
of Xen.  And the plethora of recent changes in Linux/ia64
involving machine memory models indicates there are still many
unknowns.  P==M more easily supports a model where domain0
owns ALL of machine memory *except* a small amount reserved for
and protected by Xen itself.  If this is all true, Xen/x86 may
eventually need to move to a dom0 P==M model, in which case it
would be silly for Xen/ia64 to move to P2M and then back to P==M.

Others think these features will be easy to implement in Xen and,
with minor changes, entirely compatible with P2M.  And that
P2M is the once and future model for domain0.

SUMMARY

I''m sure there are more issues and tradeoffs that will come up
in discussion, but let me summarize these:

Move domain0 to P2M:
+ Fewer differences in Xen drivers between Xen/x86 and Xen/ia64
+ Fewer differences in Xen drivers between Xen/VT-x and Xen/VT-i
+ Easier to implement remaining Xen drivers for Xen/ia64
- Major changes may require months for Xen/ia64 to regain stability
- Many more changes to Xenlinux/ia64; more difficulty pushing upstream
- No attempt to make Xen more resilient for future architectures

Leave domain0 as P==M:
+ Fewer changes in Xenlinux; easier to push upstream
+ Making Xen more flexible is a good thing
? May provide better foundation for future features (oversubscr, NUMA)
- More restrictions on driver domains
- More hacks required for some Xen drivers, or
- More work to better abstract and define a portable driver
  architecture abstract

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Williamson

2006-Jan-10 19:34 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

Dan,

Thanks for the summary, it''s nice to see all the arguments presented
together.
> 3) virtual physical (VP): The guest is given max_pages of
>    contiguous physical memory starting at zero.  Xen provides
>    the illusion to the guest that this is machine memory;
>    any physical-to-machine translation required for functional
>    correctness is handled invisibly by Xen.  VP cannot be used
>    by guests that directly program DMA-based I/O devices
>    because a DMA device requires a machine address and, by
>    definition, the guest knows only about physical addresses.
>
> Xen/x86 and Xen/x86_64 use P2M, but switch to VP (aka "shadow
> mode") for an unprivileged guest when a migration is underway.
> Xen/ia64 currently uses P==M for domain0 and VP for unprivileged
> guests.  Xen/ppc intends to use VP only.
NB. the shadow mode for migration (logdirty) doesn''t actually
virtualise the
physical <-> machine mapping - a paravirt guest on x86 always knows where
all
its pages are in machine memory.  All that''s being hidden in this case
is
that the pagetables are being shadowed (so that pages can be transparently 
write protected).
> Driver domains are "coming soon" and support of driver domains is
a
> "must", however support for hybrid driver domains (i.e. domains
that
> utilize both backend and frontend drivers) is open to debate.  It can
> be assumed however that all driver domains will require DMA access.
>
> P2M should make driver domains easier to implement (once the initial
> Xenlinux/ia64 work is completed) and able to support a broader range
> of functionality.  P==M may disallow hybrid driver domains and
> create other restrictions, though some creative person may be able
> to solve these.
I''d think that driver domains themselves would be quite attractive on
IA64 -
for big boxes, it allows you to partition the hardware devices *and* 
potentially improve uptime by isolating driver faults.

For what you call "hybrid" domains, there are people using this for
virtual
DMZ functionality...  I guess it''d be nice to enable it.  Presumably
the
problem is that the backend does some sort of P-to-M translation itself?

Do you have a plan for how you would implement P==M driver domains?

Cheers,
Mark
> FUTURE XEN FEATURE SUPPORT
>
> None of the approaches have been "design-tested" significantly
for
> support or compatibility with future Xen functionality such as
> oversubscription or machine-memory hot-plug, nor for exotic
> machine memory topologies such as NUMA or discontig (sparsely
> populated).  Such functionalities and topologies are much more
> likely to be encountered in high-end server architectures rather
> than widely-available PCs and low-end servers.  There is some
> debate as to whether the existing Xen memory architecture will easily
> evolve to accommodate these future changes or if more fundamental
> changes will be required.  Architectural decisions and restrictions
> should be made with these uncertainties in mind.
>
> Some believe that discovery and policy for machine memory will
> eventually need to move out of Xen into domain0, leaving only
> enforcement mechanism in Xen.  For example, oversubscription, NUMA
> or hot-plug memory support are likely to be fairly complicated
> and a commonly stated goal is to move unnecessary complexity out
> of Xen.  And the plethora of recent changes in Linux/ia64
> involving machine memory models indicates there are still many
> unknowns.  P==M more easily supports a model where domain0
> owns ALL of machine memory *except* a small amount reserved for
> and protected by Xen itself.  If this is all true, Xen/x86 may
> eventually need to move to a dom0 P==M model, in which case it
> would be silly for Xen/ia64 to move to P2M and then back to P==M.
>
> Others think these features will be easy to implement in Xen and,
> with minor changes, entirely compatible with P2M.  And that
> P2M is the once and future model for domain0.
>
> SUMMARY
>
> I''m sure there are more issues and tradeoffs that will come up
> in discussion, but let me summarize these:
>
> Move domain0 to P2M:
> + Fewer differences in Xen drivers between Xen/x86 and Xen/ia64
> + Fewer differences in Xen drivers between Xen/VT-x and Xen/VT-i
> + Easier to implement remaining Xen drivers for Xen/ia64
> - Major changes may require months for Xen/ia64 to regain stability
> - Many more changes to Xenlinux/ia64; more difficulty pushing upstream
> - No attempt to make Xen more resilient for future architectures
>
> Leave domain0 as P==M:
> + Fewer changes in Xenlinux; easier to push upstream
> + Making Xen more flexible is a good thing
> ? May provide better foundation for future features (oversubscr, NUMA)
> - More restrictions on driver domains
> - More hacks required for some Xen drivers, or
> - More work to better abstract and define a portable driver
>   architecture abstract
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Liguori

2006-Jan-10 19:55 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

Hi Dan,

Thanks for the thorough explaination of physical memory virtualization.  
It''s a topic that there isn''t a lot of good reference on.

You seem to conclude that the only possible solutions are making the 
dom0 either P==M or P2M.  Is it not possible to make dom0 VP?

If the only issue for making dom0 VP is DMA, wouldn''t it be easier to 
modify the Linux DMA subsystem[1] to make a special hypercall to 
essentially pin a VP to a particular MFN that could be used for the 
DMA?  One could imagine the hypervisor reversing low memory specifically 
for DMA such that bounce buffers could be avoided too.

VP makes a lot of interesting memory optimizations considerably easier 
(memory compacting, swapping, etc.).

[1] Realizing that I know very little about the Linux DMA subsystem so I 
don''t know if this is outside the realm of possibilities.

Regards,

Anthony Liguori

Magenheimer, Dan (HP Labs Fort Collins) wrote:
>A fundamental architectural decision has to be made for
>Xen regarding handling of physical/machine memory; at a high
>level, the question is:
>
>	Should Xen drivers be made more flexible to accommodate
>	different approaches to managing physical memory, or
>	should other architectures be required to conform to
>	the Xen/x86 model?
>
>A more detailed description of the specific decision is below.
>The Xen/ia64 community would like to make this decision soon --
>possibly at the Xen summit -- as next steps of Xen/ia64
>functionality are significantly affected.  Since either choice
>has an impact on common code and on future Xen architecture,
>this decision must involve core Xen developers and the broader
>Xen community rather than just Xen/ia64 developers.
>
>While this may seem to be a trivial matter, such fundamental
>choices often have a way of pre-selecting future design and
>implementation directions that can have major negative or positive
>impacts -- possibly unexpected -- on different parties.  For example,
>a decision might make a Xen developers'' life easier but create
>headaches for a distro or a Linux maintainer.  If nothing else,
>discussing fundamental decision points often helps to
>bring out and codify/document hidden assumptions about
>the future.
>
>This is a lengthy document but I hope to touch on most of
>the various issues and tradeoffs.  Understanding -- or, at
>a minimum, reading -- this document should probably be
>a prerequisite for involvement in discussions to resolve this.
>I would encourage all readers to give the issues and tradeoffs
>some thought as the "obvious x86" answer may not be the best
>answer for the future of Xen.
>
>First a little terminology and background:
>
>In a virtualized environment, the resources of the physical
>machine must subdivided and/or shared between multiple virtual
>machines.  Like an OS manages memory for its applications, one of
>the primary roles of a hypervisor is to provide the illusion to
>each guest OS that it owns some amount of "RAM" in the system.
>Thus there are two kinds of physical memory addresses: the
>addresses that a guest believes to be physical addresses and
>the addresses that actually refer to RAM (e.g. bus addresses).
>The literature (and Xen) confusingly labels these as "physical"
>addresses and "machine" addresses.  In a virtualized environment,
>there must be some way of maintaining the relationship -- or
>"mapping" -- between physical addresses and machine addresses.
>
>In Xen (across all architectures), there are currently three
>different approaches for mapping physical addresses to machine
>addresses:
>
>1) P==M: The guest is given a subset of machine memory that it
>   can access "directly".  Accesses to machine memory addresses
>   outside of this range must somehow be restricted (but not
>   necessarily disallowed) by Xen.
>
>2) guest-aware p!=m (P2M): The guest is given max_pages of
>   contiguous physical memory starting at zero and the knowledge
>   that physical addresses are different than machine addresses.
>   The guest must understand the difference between a physical
>   address and a machine address and utilize the correct one in
>   different situations.
>
>3) virtual physical (VP): The guest is given max_pages of
>   contiguous physical memory starting at zero.  Xen provides
>   the illusion to the guest that this is machine memory;
>   any physical-to-machine translation required for functional
>   correctness is handled invisibly by Xen.  VP cannot be used
>   by guests that directly program DMA-based I/O devices
>   because a DMA device requires a machine address and, by
>   definition, the guest knows only about physical addresses.
>
>Xen/x86 and Xen/x86_64 use P2M, but switch to VP (aka "shadow
>mode") for an unprivileged guest when a migration is underway.
>Xen/ia64 currently uses P==M for domain0 and VP for unprivileged
>guests.  Xen/ppc intends to use VP only.
>
>There is an architectural proposal to change Xen/ia64 so that
>domain0 uses P2M instead of P==M.  We will call this choice P2M
>and the choice to stay on the current path P==M.
>
>Here''s what I think are the key issues/tradeoffs:
>
>XEN CODE IMPACT
>
>Some Xen drivers, such as the blkif driver, have been "converted"
>to accommodate P==M. Others have not.  For example, the balloon driver
>currently assumes domain0 is P2M and thus does not currently work
>on Xen/ia64 or Xen/ppc.  The word "converted" is quoted because
>nobody is particularly satisfied with the current state of the
>converted drivers.  Many apparently significant function calls are
>define''d out of existence by macros.  Other code does radically
>different things depending on the architecture or on whether it
>is being executed by dom0 or an unprivileged domain.  And a few
>ifdef''s are sprinkled about.  In short, what''s done works
but is
>an ugly hack.  Some believe that the best way to solve this mess
>is for other architectures to do things more like Xen/x86.  Others
>believe there is an advantage to defining clear abstractions and
>making the drivers truly more architecture-independent.
>
>P2M will require some rewriting of existing Xen/ia64 core code and the
>addition of significant changes to Xenlinux/ia64 code but will allow
>much easier porting of Xen''s balloon/networking/migration drivers
>and also enable some simplifying changes in the Xen block driver.
>It is fair to guess that it will take at least several weeks/months
>to rewrite and debug the core and Xenlinux code to get Xen/ia64 back
>to where it is today, but future driver work will be much faster.
>Fewer differences from Xen/x86 means less maintenance work for Xen
>core and Xen/ia64 developers.  I''d imagine also that more code will
>be shared between Xen/VT-i and Xen/VT-x.
>
>P==M will require Xen''s balloon/networking/migration drivers to
>evolve to incorporate non-P2M models.  This can be done, but is most
>likely to end up (at least in the short term) as a collection of
>unpalatable hacks like with the Xen block driver.  However, making
>Xen drivers more tolerant of different approaches may be a good
>thing in the long run for Xen.
>
>XENLINUX IMPACT
>
>Today''s operating systems are not implemented with an understanding
>that a physical address and a machine address might be different.
>Building this awareness into an OS requires non-trivial source
>code change.  For example, Xenlinux/x86 maintains a "p2m" mapping
>table for quick translation and provides a "m2p" hypercall to keep
>Xen in sync.  OS code that manipulates physical addresses must be
>modified to access/manage this table and make hypercalls when
>appropriate.  Macros can hide much of the complexity but much OS/driver
>code exists that does not use standard macros.  There is some
>disagreement on how extensive are the required source code changes,
>and how difficult it will be to maintain these changes across future
>versions of guest OS''s.  One illustrative example however:  In
>paravirtualizing Xenlinux/ia64, seven header files are changed;
>it is closer to 40 for Xenlinux/x86.
>
>Related, some would assert that pushing a small number of changes into
>Linux (or any OS, open source or not) is far easier that pushing a
>large number of changes into Linux.  Until all the Xen/x86 changes are
>in, it remains to be seen whether this is true or not.  There is
>a reasonable concern that the broad review required for such
>an extensive set of changes will involve a large number of people
>with a large number of agendas and force a number of Xen design
>issues to be revisited -- at least clearly justified if not changed.
>This is especially true if Xen''s foes have any influence in the
>process.
>
>Transparent paravirtualization (also called "shared binary") is
the
>ability for the same binary to be used both as a Xen guest and
>natively on real hardware.  Xenlinux/ia64 currently support this;
>indeed, ignoring a couple of existing bugs, the same Xenlinux/ia64
>binary can be used natively, and as domain0 and as an unprivileged
>domain. There have been proposals to do the same for Xenlinux/x86,
>but the degree of code changed is much much higher.  There is debate
>about the cost/benefit of transparent paravirtualization, but the
>primary beneficiaries -- distros and end customers -- are not very
>well represented here.
>
>With P2M, it is unlikely that Xenlinux/ia64 will ever again be
>transparently paravirtualizable.  As with Xenlinux/x86, the changes
>will probably be pushed into a subarch (mach-xen).  Since Linux/ia64
>has a more diverse set of subarch''s, there may be additional work
>to ensure that Xen is orthogonal (and thus works with) all the
>subarch''s.
>
>P==M would continue to allow transparent paravirtualization.
>This plus the reduced number of changes should make it easier to
>get Xen/ia64 support into Linux/ia64 (assuming Xen/x86 support
>gets included in Linux/x86).
>
>DRIVER DOMAINS
>
>Driver domains are "coming soon" and support of driver domains is
a
>"must", however support for hybrid driver domains (i.e. domains
that
>utilize both backend and frontend drivers) is open to debate.  It can
>be assumed however that all driver domains will require DMA access.
>
>P2M should make driver domains easier to implement (once the initial
>Xenlinux/ia64 work is completed) and able to support a broader range
>of functionality.  P==M may disallow hybrid driver domains and
>create other restrictions, though some creative person may be able
>to solve these.
>
>FUTURE XEN FEATURE SUPPORT
>
>None of the approaches have been "design-tested" significantly for
>support or compatibility with future Xen functionality such as
>oversubscription or machine-memory hot-plug, nor for exotic
>machine memory topologies such as NUMA or discontig (sparsely
>populated).  Such functionalities and topologies are much more
>likely to be encountered in high-end server architectures rather
>than widely-available PCs and low-end servers.  There is some
>debate as to whether the existing Xen memory architecture will easily
>evolve to accommodate these future changes or if more fundamental
>changes will be required.  Architectural decisions and restrictions
>should be made with these uncertainties in mind.
>
>Some believe that discovery and policy for machine memory will
>eventually need to move out of Xen into domain0, leaving only
>enforcement mechanism in Xen.  For example, oversubscription, NUMA
>or hot-plug memory support are likely to be fairly complicated
>and a commonly stated goal is to move unnecessary complexity out
>of Xen.  And the plethora of recent changes in Linux/ia64
>involving machine memory models indicates there are still many
>unknowns.  P==M more easily supports a model where domain0
>owns ALL of machine memory *except* a small amount reserved for
>and protected by Xen itself.  If this is all true, Xen/x86 may
>eventually need to move to a dom0 P==M model, in which case it
>would be silly for Xen/ia64 to move to P2M and then back to P==M.
>
>Others think these features will be easy to implement in Xen and,
>with minor changes, entirely compatible with P2M.  And that
>P2M is the once and future model for domain0.
>
>SUMMARY
>
>I''m sure there are more issues and tradeoffs that will come up
>in discussion, but let me summarize these:
>
>Move domain0 to P2M:
>+ Fewer differences in Xen drivers between Xen/x86 and Xen/ia64
>+ Fewer differences in Xen drivers between Xen/VT-x and Xen/VT-i
>+ Easier to implement remaining Xen drivers for Xen/ia64
>- Major changes may require months for Xen/ia64 to regain stability
>- Many more changes to Xenlinux/ia64; more difficulty pushing upstream
>- No attempt to make Xen more resilient for future architectures
>
>Leave domain0 as P==M:
>+ Fewer changes in Xenlinux; easier to push upstream
>+ Making Xen more flexible is a good thing
>? May provide better foundation for future features (oversubscr, NUMA)
>- More restrictions on driver domains
>- More hacks required for some Xen drivers, or
>- More work to better abstract and define a portable driver
>  architecture abstract
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
>
>  
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Hollis Blanchard

2006-Jan-10 23:02 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

On Tue, 2006-01-10 at 11:26 -0800, Magenheimer, Dan (HP Labs Fort
Collins) wrote:> 
> 1) P==M: The guest is given a subset of machine memory that it
>    can access "directly".  Accesses to machine memory addresses
>    outside of this range must somehow be restricted (but not
>    necessarily disallowed) by Xen.
> 
> 2) guest-aware p!=m (P2M): The guest is given max_pages of
>    contiguous physical memory starting at zero and the knowledge
>    that physical addresses are different than machine addresses.
>    The guest must understand the difference between a physical
>    address and a machine address and utilize the correct one in
>    different situations.
> 
> 3) virtual physical (VP): The guest is given max_pages of
>    contiguous physical memory starting at zero.  Xen provides
>    the illusion to the guest that this is machine memory;
>    any physical-to-machine translation required for functional
>    correctness is handled invisibly by Xen.  VP cannot be used
>    by guests that directly program DMA-based I/O devices
>    because a DMA device requires a machine address and, by
>    definition, the guest knows only about physical addresses.
> 
> Xen/x86 and Xen/x86_64 use P2M, but switch to VP (aka "shadow
> mode") for an unprivileged guest when a migration is underway.
> Xen/ia64 currently uses P==M for domain0 and VP for unprivileged
> guests.  Xen/ppc intends to use VP only.
> 
> There is an architectural proposal to change Xen/ia64 so that
> domain0 uses P2M instead of P==M.  We will call this choice P2M
> and the choice to stay on the current path P==M.
So ia64 dom0 physical 0 is machine 0? Where does Xen live in machine
space?

PowerPC exception handlers are architecturally hardcoded to the first
couple pages of memory, so Xen needs to live there. Linux expects it is
booting at 0 of course, so dom0 runs in an offset physical address
space.

The trouble then comes when dom0 needs to access IO or domU memory;
obviously dom0 must have some awareness of the machine space.
Accordingly, I''m thinking I''m going to need to install p2m
tables in
dom0, and once they''re there, why not have domU use them too?

-- 
Hollis Blanchard
IBM Linux Technology Center


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Magenheimer, Dan (HP Labs Fort Collins)

2006-Jan-11 00:13 UTC

head link

RE: [Xen-devel] Essay on an important Xen decision (long)

> NB. the shadow mode for migration (logdirty) doesn''t actually 
> virtualise the 
> physical <-> machine mapping - a paravirt guest on x86 always 
> knows where all 
> its pages are in machine memory.  All that''s being hidden in 
> this case is 
> that the pagetables are being shadowed (so that pages can be 
> transparently 
> write protected).
Thanks for the clarification! 
> I''d think that driver domains themselves would be quite 
> attractive on IA64 - 
> for big boxes, it allows you to partition the hardware devices *and* 
> potentially improve uptime by isolating driver faults.
Probably true, but I think most "big box" customers are looking
for partition isolation beyond what is possible with Xen (at
least near-term).
> For what you call "hybrid" domains, there are people using 
> this for virtual 
> DMZ functionality...  I guess it''d be nice to enable it.  
> Presumably the 
> problem is that the backend does some sort of P-to-M 
> translation itself?
> 
> Do you have a plan for how you would implement P==M driver domains?
Only roughly.  Detailed design and implementation was to wait
until after driver domain support gets back into Xen/x86 (and until
after this P?M decision is made).

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Magenheimer, Dan (HP Labs Fort Collins)

2006-Jan-11 00:22 UTC

head link

RE: [Xen-devel] Essay on an important Xen decision (long)

> You seem to conclude that the only possible solutions are making the 
> dom0 either P==M or P2M.  Is it not possible to make dom0 VP?
> 
> If the only issue for making dom0 VP is DMA, wouldn''t it be easier
to
> modify the Linux DMA subsystem[1] to make a special hypercall to 
> essentially pin a VP to a particular MFN that could be used for the 
> DMA?  One could imagine the hypervisor reversing low memory 
> specifically 
> for DMA such that bounce buffers could be avoided too.
> [1] Realizing that I know very little about the Linux DMA 
> subsystem so I 
> don''t know if this is outside the realm of possibilities.
Technically, if the guest source needs to be changed so that
some code deals with physical addresses and other code deals
with machine addresses, I would call that a flavor of P2M.

If the "DMA subsystem" is the only place where the mapping needs
to be done and the affected code can be cleanly isolated, your
suggestion is a good one.  I''m no expert on Linux DMA code
either, but I believe it isn''t very clean.
 > VP makes a lot of interesting memory optimizations 
> considerably easier 
> (memory compacting, swapping, etc.).
Yes, definitely, and oversubscription, different kinds of
migration, NUMA physical memory affinity migration, etc.

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Magenheimer, Dan (HP Labs Fort Collins)

2006-Jan-11 00:39 UTC

head link

RE: [Xen-devel] Essay on an important Xen decision (long)

> So ia64 dom0 physical 0 is machine 0? Where does Xen live in machine
> space?
> 
> PowerPC exception handlers are architecturally hardcoded to the first
> couple pages of memory, so Xen needs to live there. Linux 
> expects it is
> booting at 0 of course, so dom0 runs in an offset physical address
> space.
On ia64, Xen (and Linux when booting natively) is relocatable.
Machine address 0 is not special on ia64 like it is on PowerPC.
 > The trouble then comes when dom0 needs to access IO or domU memory;
> obviously dom0 must have some awareness of the machine space.
> Accordingly, I''m thinking I''m going to need to install
p2m tables in
> dom0, and once they''re there, why not have domU use them too?
On ia64, machine memory is exposed to a native OS via EFI (firmware)
tables.  (I think these are similar to e820 on x86 machines and
don''t know how this is done on PowerPC.)  When Xen/ia64 starts domain0
(or a domU), it passes a faked EFI table.  This table is faked
differently for domain0 and domU''s.  One solution, for example,
would be for Xen to "give" all machine memory to dom0, protecting
only a small portion for itself.  Then when other domains are
created, all the memory for domUs would be "ballooned" from dom0.

Per the previous exchange with Anthony, there are many advantages
to being able to move memory around invisibly to domains, which
is easy with VP and much harder with P2M.  The current debate on
Xen/ia64 is just for domain0 but it could expand...


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2006-Jan-11 07:56 UTC

head link

RE: [Xen-devel] Essay on an important Xen decision (long)

>From: Magenheimer, Dan
>Sent: 2006年1月11日 3:26
Hi, Dan,
	Good background for discussion.
>[...]
>an ugly hack.  Some believe that the best way to solve this mess
>is for other architectures to do things more like Xen/x86.  Others
>believe there is an advantage to defining clear abstractions and
>making the drivers truly more architecture-independent.
I would say two options above don''t conflict actually. ;-) Move to
Xen/x86 for things really common with clearer abstraction for architecture
difference. We need carefully differentiate which part of mess really comes from
arch reason, and which part is common but simply missed due to early quick
bring-up requirement. I don''t think this is enough cared by far. Xen,
as a well-formed product, needs to have common policies and common features on
all architectures. Maybe, to implement same features can be more difficult and
even bring some performance impact on some architecture, but it''s a
must-to-have requirement from customer''s point of view if customer
acknowledges it. Just raise it here as an important factor when considering the
final solution cross-architecture.
>[...]
>XENLINUX IMPACT
>
>Xen in sync.  OS code that manipulates physical addresses must be
>modified to access/manage this table and make hypercalls when
>appropriate.  Macros can hide much of the complexity but much OS/driver
>code exists that does not use standard macros.  There is some
This seems to be an issue for driver modules to be re-compiled... ;-(
>Transparent paravirtualization (also called "shared binary") is
the
>ability for the same binary to be used both as a Xen guest and
>natively on real hardware.  Xenlinux/ia64 currently support this;
>indeed, ignoring a couple of existing bugs, the same Xenlinux/ia64
>binary can be used natively, and as domain0 and as an unprivileged
>domain. There have been proposals to do the same for Xenlinux/x86,
>but the degree of code changed is much much higher.  There is debate
>about the cost/benefit of transparent paravirtualization, but the
>primary beneficiaries -- distros and end customers -- are not very
>well represented here.
Transparent is welcomed, which however doesn''t mean conservative
self-restriction upon modification to xenlinux. Transparent with good
performance is the goal to pursue, though xenlinux/x86 does need more efforts to
make it happen.
>
>With P2M, it is unlikely that Xenlinux/ia64 will ever again be
>transparently paravirtualizable.  As with Xenlinux/x86, the changes
>will probably be pushed into a subarch (mach-xen).  
First sub-arch, and further a configurable feature later with negligible impact
to native running? ;-)
>[...]
>
>Some believe that discovery and policy for machine memory will
>eventually need to move out of Xen into domain0, leaving only
>enforcement mechanism in Xen.  For example, oversubscription, NUMA
>or hot-plug memory support are likely to be fairly complicated
>and a commonly stated goal is to move unnecessary complexity out
>of Xen.  And the plethora of recent changes in Linux/ia64
>involving machine memory models indicates there are still many
>unknowns.  P==M more easily supports a model where domain0
>owns ALL of machine memory *except* a small amount reserved for
>and protected by Xen itself.  If this is all true, Xen/x86 may
>eventually need to move to a dom0 P==M model, in which case it
>would be silly for Xen/ia64 to move to P2M and then back to P==M.
I don''t think it''s a good design choice by complete takeover
to dom0. Moving ownership to dom0 doesn’t mean a simple move, since memory
sub-system is the core/base of Xen. Extra context switches are added for any
page related operation. Also by P==M model, how do you ensure a scalable
allocation environment after a long run? Any activities within dom0 which
consumes Physical frames, thus actually eats Machine frames. Security will be
another issue though I can''t come out a clear example immediately...
>
>SUMMARY
>[...]
This summary is good.

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gerd Hoffmann

2006-Jan-11 09:33 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

Hi,
> If the only issue for making dom0 VP is DMA, wouldn''t it be easier
to
> modify the Linux DMA subsystem[1] to make a special hypercall to 
> essentially pin a VP to a particular MFN that could be used for the 
> DMA?
Linux has a nice API for DMA memory management, see 
Documentation/DMA-mapping.txt.  Basically you pass in a "struct page" 
and a offset (within that page) and get back a dma address you can pass 
on to your hardware.  That is required for some architectures where 
phyical addresses (as seen by the CPU) and bus addresses (as seen by the 
pci devices) are not identical.  It''s also needed on archs which have
an
iommu to create/delete mapping entries there.

I think that API should do just fine for any DMA transfer dom0 wants to 
do for its own pages.  xenlinux would simply need a special 
implementation of that API which calls xen to translate the VP address 
into a dma address (usually same as machine address).  Probably xen must 
also handle a iommu (if present) to ensure secure dma once we have 
hardware which supports this.

A bit more tricky are DMA transfers for _other_ domains (i.e. what the 
blkback driver has to do).  blkback maps the foreign domain pages into 
its own address space, and I think there is no way around that right now 
API-wise as otherwise there isn''t a "struct page" for the
page ...

cheers,

   Gerd

-- 
Gerd ''just married'' Hoffmann <kraxel@suse.de>
I''m the hacker formerly known as Gerd Knorr.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Jan-11 10:08 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

On 10 Jan 2006, at 19:55, Anthony Liguori wrote:
> You seem to conclude that the only possible solutions are making the 
> dom0 either P==M or P2M.  Is it not possible to make dom0 VP?
>
> If the only issue for making dom0 VP is DMA, wouldn''t it be easier
to
> modify the Linux DMA subsystem[1] to make a special hypercall to 
> essentially pin a VP to a particular MFN that could be used for the 
> DMA?  One could imagine the hypervisor reversing low memory 
> specifically for DMA such that bounce buffers could be avoided too.
>
> VP makes a lot of interesting memory optimizations considerably easier 
> (memory compacting, swapping, etc.).
On an architecture where VP is cheaper to implement than on x86, it may 
well make sense to do that in preference to P2M. As you say, it makes 
certain future extensions less of a pain to implement.

If ia64 does decide to back off from the P==M route then I suspect VP 
is the way to go (which is I think how ia64 domU''s currently work 
anyway).

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tristan Gingold

2006-Jan-11 10:46 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

Le Mardi 10 Janvier 2006 20:55, Anthony Liguori a écrit
:> Hi Dan,
>
> Thanks for the thorough explaination of physical memory virtualization.
> It''s a topic that there isn''t a lot of good reference on.
>
> You seem to conclude that the only possible solutions are making the
> dom0 either P==M or P2M.  Is it not possible to make dom0 VP?
>
> If the only issue for making dom0 VP is DMA, wouldn''t it be easier
to
> modify the Linux DMA subsystem[1] to make a special hypercall to
> essentially pin a VP to a particular MFN that could be used for the
> DMA?  One could imagine the hypervisor reversing low memory specifically
> for DMA such that bounce buffers could be avoided too.
>
> VP makes a lot of interesting memory optimizations considerably easier
> (memory compacting, swapping, etc.).
>
> [1] Realizing that I know very little about the Linux DMA subsystem so I
> don''t know if this is outside the realm of possibilities.Hi,

a few years ago (it was with linux 2.2), I wrote device drivers for rather 
complex hardware.  DMA subsystem didn''t really exist.  The main reason
is an
hardware reason: DMA chip do not exist anymore because nowaday (almost since 
PCI) every driver chip do DMA by itself.

Tristan.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Harry Butterworth

2006-Jan-11 13:37 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

On Tue, 2006-01-10 at 11:26 -0800, Magenheimer, Dan (HP Labs Fort
Collins) wrote:> A fundamental architectural decision has to be made for
> Xen regarding handling of physical/machine memory; at a high
> level, the question is:
> 
> 	Should Xen drivers be made more flexible to accommodate
> 	different approaches to managing physical memory, or
> 	should other architectures be required to conform to
> 	the Xen/x86 model?
I believe the right approach is to decouple the driver implementation
from the memory management architecture by defining a high level API to
build the drivers on.  The API should be expressed in terms of the
operations that the drivers need to perform rather than in terms of the
underlying primitives that are actually used to perform those
operations.

Such an API would allow decisions about memory management to be made
independent of the drivers and would allow the memory management
architecture to be changed relatively easily at a later date since the
resulting damage would be contained within the core library that
implemented the driver infrastructure API.

I think this is the right approach because:

o - Decoupling the drivers from the memory management architecture
reduces the cost of future memory management architecture changes and
keeps our options open, so is a lower risk approach than choosing a
memory management architecture now and trying to stick with it.

o - A good high level driver infrastructure API will clean up the
drivers considerably.

o - Containing the code which performs low-level memory manipulations
within a core driver infrastructure library written by an expert will
result in higher overall quality across all the drivers.

o - As a driver author, given a high level driver infrastructure API
which decouples me from the memory management architecture, the choice
of P==M, P2M or VP is no longer my concern.

I have made a first attempt at defining a high level driver
infrastructure API for general use by xen split drivers.  This is the
xenidc API and, whilst it is designed for general use, it currently has
one client: the split USB driver.

I believe that xenidc completely decouples its clients from the memory
management architecture such that, for example, there should be no
changes required in the USB driver code when porting it from x86 to ia64
and PPC (this will be true whether or not the memory management
architecture for those platforms is changed to be more like x86).

All required changes ought to be contained within the xenidc
implementation and therefore would only need to be implemented once for
all clients of xenidc.

The choice of a common memory management architecture or different
memory management architectures across platforms or different options
for memory management architectures for a particular platform or
different options for memory management architecture at run-time for
transparent virtualization can all be contained within the xenidc
implementation.

In addition to decoupling the client driver code from the memory
management architecture, the xenidc API provides:

o - Convenient inter-domain communication primitives which encapsulate
the rather complicated state machine required for correct set-up and
tear down of inter-domain communication channels for (un)loadable driver
modules.

o - A convenient inter-domain bulk transport.

o - An up-front-reservation resource management strategy.

o - Driver forwards-compatibility with a network transparent xenidc
implementation.

I have attached the latest xenidc patch which includes documentation of
the xenidc API (added by the patch to the Xen interface document).

I have also attached the latest USB patch as an example of a client of
the xenidc API.

(Since the last time I posted these patches I have fixed a couple of
compiler warnings for the X86_64 build).

A few points to note:

o - xenidc is an infrastructure for the Xen-specific split drivers.
Xenidc doesn''t directly address the issue of making the native drivers
work correctly under virtualization but does allow you to do that
however you like across different architectures whilst maintaining
common code for all the split drivers.

o - This is just a first attempt which I wrote mainly to decouple the
USB driver from churn in the underlying infrastructure.  The API is
generally useful but only covers the operations that were actually
required for the USB driver.  There is already enough in the API to base
other drivers on it but the API would need to be fleshed out with some
different kinds of operations before it would be possible to implement
all drivers with the same efficient primitives that are used today.

o - Unfortunately I didn''t get funding to attend the Xen summit so I
won''t be there to present on Xenidc.  I''m not concerned about
whether
xenidc gets accepted as-is but I do hope it will be useful as an example
of the kind of API that we could have.  I''ll be happy to answer any
questions on the list.

Harry.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Liguori

2006-Jan-11 16:22 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

Gerd Hoffmann wrote:
>   Hi,
>
>> If the only issue for making dom0 VP is DMA, wouldn''t it be
easier to
>> modify the Linux DMA subsystem[1] to make a special hypercall to 
>> essentially pin a VP to a particular MFN that could be used for the
DMA?
>
>
> Linux has a nice API for DMA memory management, see 
> Documentation/DMA-mapping.txt.  Basically you pass in a "struct
page"
> and a offset (within that page) and get back a dma address you can 
> pass on to your hardware.  That is required for some architectures 
> where phyical addresses (as seen by the CPU) and bus addresses (as 
> seen by the pci devices) are not identical.  It''s also needed on
archs
> which have an iommu to create/delete mapping entries there.
>
> I think that API should do just fine for any DMA transfer dom0 wants 
> to do for its own pages.  xenlinux would simply need a special 
> implementation of that API which calls xen to translate the VP address 
> into a dma address (usually same as machine address).  Probably xen 
> must also handle a iommu (if present) to ensure secure dma once we 
> have hardware which supports this.
Excellent, thanks for the reference!
>
> A bit more tricky are DMA transfers for _other_ domains (i.e. what the 
> blkback driver has to do).  blkback maps the foreign domain pages into 
> its own address space, and I think there is no way around that right 
> now API-wise as otherwise there isn''t a "struct page"
for the page ...
There are, of course, other ways around this.  One could have a 
hypervisor level DMA API that allowed bulk transfer of memory between 
domains (either by copying or page flipping depending the size of the 
buffer).  Another option would be a separate pool of sharable memory 
that could be mapped appropriately into a domain''s VPM space.

Regards,

Anthony Liguori
> cheers,
>
>   Gerd
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Williamson

2006-Jan-11 16:22 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

> Is VP on x86 expensive in terms of performance or complexity?
One nasty thing for VP on x86 is the compulsory hardware PT walker - IA64 
allows the hypervisor to handle TLB fills on behalf of a guest, so that it 
can perform phys-to-machine translation.  IA64 has a hardware PT walker but 
you aren''t *forced* to use it.

IIRC, PPC also performs P-to-M translations in the hypervisor, but I vaguely 
recall that happening during an explicit pagetable update hypercall - kind of 
a middle road between the x86 and IA64 approaches...  Some PPC guy may jump 
in and correct me at this point, though ;-)
> I imagine that you would have to always have shadow paging enable but
> you could still do bulk updates ala writable page tables so the
> performance cost should be minimal I would think.
>
> Trying to understand the memory system in more details so any additional
> info is much appreciate :-)
I don''t see why that couldn''t perform decently, although
it''d have more
overhead than allowing the guest to manage its pagetables directly...  I 
*thought* this was intended to be supported at some point, but I''m not
sure
if it''s been needed yet.  Others may have more concrete numbers for the
performance - I think writable PTs got benchmarked against shadowing at some 
point.

Cheers,
Mark
> Thanks,
>
> Anthony Liguori
>
> > If ia64 does decide to back off from the P==M route then I suspect VP
> > is the way to go (which is I think how ia64 domU''s currently
work
> > anyway).
> >
> >  -- Keir
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
-- > Just a question. What use is a unicyle with no seat?  And no pedals!Me: To answer a question with a question: What use is a
skateboard?> Skateboards have wheels.Me: My wheel has a wheel!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Liguori

2006-Jan-11 16:25 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

Hi Keir,

Keir Fraser wrote:
>
> On an architecture where VP is cheaper to implement than on x86, it 
> may well make sense to do that in preference to P2M. As you say, it 
> makes certain future extensions less of a pain to implement.
Is VP on x86 expensive in terms of performance or complexity?

I imagine that you would have to always have shadow paging enable but 
you could still do bulk updates ala writable page tables so the 
performance cost should be minimal I would think.

Trying to understand the memory system in more details so any additional 
info is much appreciate :-)

Thanks,

Anthony Liguori
> If ia64 does decide to back off from the P==M route then I suspect VP 
> is the way to go (which is I think how ia64 domU''s currently work 
> anyway).
>
>  -- Keir
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Jan-11 16:38 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

On 11 Jan 2006, at 16:25, Anthony Liguori wrote:
>> On an architecture where VP is cheaper to implement than on x86, it 
>> may well make sense to do that in preference to P2M. As you say, it 
>> makes certain future extensions less of a pain to implement.
>
> Is VP on x86 expensive in terms of performance or complexity?
>
> I imagine that you would have to always have shadow paging enable but 
> you could still do bulk updates ala writable page tables so the 
> performance cost should be minimal I would think.
>
> Trying to understand the memory system in more details so any 
> additional info is much appreciate :-)
Shadow page tables do have a measurable overhead, although it''s not 
*that* big for most workloads. We already support a shadow-translate 
mode (well, the xenlinux support for it may be broken right now, but 
it''s worked in the past) for paravirt guests and various people 
researching new xen features want to make use of that. I can imagine 
that we will support both modes even in x86 at some point in the 
future, and users can make the features/performance tradeoff.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Liguori

2006-Jan-11 16:41 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

Mark Williamson wrote:
>>I imagine that you would have to always have shadow paging enable but
>>you could still do bulk updates ala writable page tables so the
>>performance cost should be minimal I would think.
>>
>>Trying to understand the memory system in more details so any additional
>>info is much appreciate :-)
>>    
>>
>
>I don''t see why that couldn''t perform decently, although
it''d have more
>overhead than allowing the guest to manage its pagetables directly...  I 
>*thought* this was intended to be supported at some point, but I''m
not sure
>if it''s been needed yet.  Others may have more concrete numbers for
the
>performance - I think writable PTs got benchmarked against shadowing at some
>point.
>  
>Just to be thorough, was the shadow paging code a "pure" shadow page 
table where ever PTE write trapped to the hypervisor or were bulk PMD 
updates sent to the hypervisor?

I''m surprised there would be a measurable difference with shadow paging
as it should only require a potential allocation (which could be fast 
pathed) and in the normal case, a couple extra reads/writes.  I would 
think that cost would be overshadowed by the original cost of the 
context switch.

Of course, I guess it wouldn''t be that much of a shock to me that the 
overhead is at least measurable...

Regards,

Anthony Liguori
>Cheers,
>Mark
>
>  
>
>>Thanks,
>>
>>Anthony Liguori
>>
>>    
>>
>>>If ia64 does decide to back off from the P==M route then I suspect
VP
>>>is the way to go (which is I think how ia64 domU''s
currently work
>>>anyway).
>>>
>>> -- Keir
>>>      
>>>
>>_______________________________________________
>>Xen-devel mailing list
>>Xen-devel@lists.xensource.com
>>http://lists.xensource.com/xen-devel
>>    
>>
>
>  
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2006-Jan-11 17:20 UTC

head link

RE: [Xen-devel] Essay on an important Xen decision (long)

> Just to be thorough, was the shadow paging code a "pure" 
> shadow page table where ever PTE write trapped to the 
> hypervisor or were bulk PMD updates sent to the hypervisor?
All of Xen''s pagetable options are able to do high-performance bulk
updates (though its actually typically more important to optimize for
the demand-fault path).
 
There was some quite extensive benchmarking done ~9 months back, and
we''re hoping to write it up and submit it somewhere. The algorithms
have
evolved a bit since so we need to rerun things.
> I''m surprised there would be a measurable difference with 
> shadow paging as it should only require a potential 
> allocation (which could be fast
> pathed) and in the normal case, a couple extra reads/writes.  
> I would think that cost would be overshadowed by the original 
> cost of the context switch.
Hint: you need to be propagate dirty and accessed bits back to the guest
pagetable.  
> Of course, I guess it wouldn''t be that much of a shock to me 
> that the overhead is at least measurable...
It''s certainly measureable, and certainly dominates the virtualization
overhead of some workloads. 


Ian

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Anthony Liguori

2006-Jan-11 17:38 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

Ian Pratt wrote:
>Hint: you need to be propagate dirty and accessed bits back to the guest
>pagetable.  
>  
>Ahh, I see now.  Thanks :-)

Regards,

Anthony Liguori
>>Of course, I guess it wouldn''t be that much of a shock to me 
>>that the overhead is at least measurable...
>>    
>>
>
>It''s certainly measureable, and certainly dominates the
virtualization
>overhead of some workloads. 
>
>
>Ian
>
>  
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Hollis Blanchard

2006-Jan-11 21:16 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

On Wed, 2006-01-11 at 16:22 +0000, Mark Williamson
wrote:> IIRC, PPC also performs P-to-M translations in the hypervisor, but I
vaguely
> recall that happening during an explicit pagetable update hypercall - kind
of
> a middle road between the x86 and IA64 approaches...  Some PPC guy may jump
> in and correct me at this point, though ;-)
It''s pretty simple: for Xen/x86, the kernel does translation and the
hypervisor does validation. For PAPR on PPC hardware, the hypervisor
does both translation and validation.

This is done for every mapping hcall: the domain makes an hcall to map
physical address P, and the hypervisor translates to machine address M
and allows or rejects the request.

Page fault exceptions are delivered by the processor to the domain (not
the hypervisor), which reacts by making a mapping hcall.

-- 
Hollis Blanchard
IBM Linux Technology Center


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Hollis Blanchard

2006-Jan-11 21:36 UTC

head link

RE: [Xen-devel] Essay on an important Xen decision (long)

On Tue, 2006-01-10 at 16:39 -0800, Magenheimer, Dan (HP Labs Fort
Collins) wrote:> > So ia64 dom0 physical 0 is machine 0? Where does Xen live in machine
> > space?
> > 
> > PowerPC exception handlers are architecturally hardcoded to the first
> > couple pages of memory, so Xen needs to live there. Linux 
> > expects it is
> > booting at 0 of course, so dom0 runs in an offset physical address
> > space.
> 
> On ia64, Xen (and Linux when booting natively) is relocatable.
> Machine address 0 is not special on ia64 like it is on PowerPC.
Right, so P==M for dom0 (or any domain) will not work on PowerPC.
 > Per the previous exchange with Anthony, there are many advantages
> to being able to move memory around invisibly to domains, which
> is easy with VP and much harder with P2M.  The current debate on
> Xen/ia64 is just for domain0 but it could expand...
As far as I can see, dom0 must be aware of the machine address space, so
that means P2M for PowerPC. dom0 is a special case: do you really need
to worry about migrating dom0, or memory compacting with other domains?

As for the question of domU being VP or P2M, I see no reason it
shouldn''t be VP. IO-capable domUs (driver domains) could be VP with
proper IOMMU support. The PowerPC PAPR and Xen/ia64 implementations
demonstrate that this works...

-- 
Hollis Blanchard
IBM Linux Technology Center


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Magenheimer, Dan (HP Labs Fort Collins)

2006-Jan-12 00:48 UTC

head link

RE: [Xen-devel] Essay on an important Xen decision (long)

> > On ia64, Xen (and Linux when booting natively) is relocatable.
> > Machine address 0 is not special on ia64 like it is on PowerPC.
> 
> Right, so P==M for dom0 (or any domain) will not work on PowerPC.
Are machine addresses 0-n the only range that are special?
And can one safely assume that DMA will never occur in this
range?  If so, then a single "special" mapping in the hypervisor
could get around this.  While I suppose this is more P~=M than
strictly P==M, it would seem a reasonable alternative to major Linux
changes.
  > > Per the previous exchange with Anthony, there are many advantages
> > to being able to move memory around invisibly to domains, which
> > is easy with VP and much harder with P2M.  The current debate on
> > Xen/ia64 is just for domain0 but it could expand...
> 
> As far as I can see, dom0 must be aware of the machine 
> address space, so
> that means P2M for PowerPC. dom0 is a special case: do you really need
> to worry about migrating dom0, or memory compacting with 
> other domains?
No, migrating dom0 or any driver domain with direct device
access is unreasonable, at least unless all device access
is virtualized (e.g. Infiniband?).  I view domain0 as closer to
a semi-privileged extension of Xen.

Not sure what you mean by memory compacting...
> As for the question of domU being VP or P2M, I see no reason it
> shouldn''t be VP. IO-capable domUs (driver domains) could be VP
with
> proper IOMMU support. The PowerPC PAPR and Xen/ia64 implementations
> demonstrate that this works...
Ignoring the page table problems on x86 (which Vmware demonstrates
is more of a performance issue than a functional issue), if DMA can
be invisibly handled, I think everyone agrees that VP has significant
advantages over either P==M or P2M.

But to clarify, Xen/ia64 domU is currently VP only because it doesn''t
do DMA. Driver domains will complicate this.

Dan




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2006-Jan-12 02:44 UTC

head link

RE: [Xen-devel] Essay on an important Xen decision (long)

>From: Mark Williamson
>Sent: 2006年1月12日 0:23
>
>> Is VP on x86 expensive in terms of performance or complexity?
>
>One nasty thing for VP on x86 is the compulsory hardware PT walker - IA64
>allows the hypervisor to handle TLB fills on behalf of a guest, so that it
>can perform phys-to-machine translation.  IA64 has a hardware PT walker but
>you aren''t *forced* to use it.
To make it clearer, this hardware PT walker on IA64 is not like normal
multi-level PT walker on x86. Instead it can be a virtually linear table, or a
hash table which is configurable. ;-)

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Williamson

2006-Jan-16 15:52 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

> To make it clearer, this hardware PT walker on IA64 is not like normal
> multi-level PT walker on x86. Instead it can be a virtually linear table,
> or a hash table which is configurable. ;-)
Am I right in thinking it''s also possible to implement "software
filled TLB"
on IA64? (as a fallback for when the hardware assist fails?).  

Cheers,
Mark

-- 
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Magenheimer, Dan (HP Labs Fort Collins)

2006-Jan-16 22:56 UTC

head link

RE: [Xen-devel] Essay on an important Xen decision (long)

> > To make it clearer, this hardware PT walker on IA64 is not 
> like normal
> > multi-level PT walker on x86. Instead it can be a virtually 
> linear table,
> > or a hash table which is configurable. ;-)
> 
> Am I right in thinking it''s also possible to implement 
> "software filled TLB"  
> on IA64? (as a fallback for when the hardware assist fails?).  
Not only possible, but normal.  If there is a TLB miss and
a VHPT (virtual hashed page table) miss, software fills
both the TLB and VHPT.

Is that what you meant?

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Williamson

2006-Jan-17 02:47 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

> Not only possible, but normal.  If there is a TLB miss and
> a VHPT (virtual hashed page table) miss, software fills
> both the TLB and VHPT.
>
> Is that what you meant?
Yep, that''s exactly what I thought happened :-)

IIRC, you said you don''t bother with the guest VHPT, right?  So
presumably you
reflect TLB misses to the guest and intercept its TLB fill instruction, apply 
the P2M translation, then add it to *Xen*''s VHPT and fill the TLB
correctly?

I know I''ve followed some of these discussions before, just a bit rusty
now ;-)

Cheers,
Mark

-- 
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Magenheimer, Dan (HP Labs Fort Collins)

2006-Jan-17 03:03 UTC

head link

RE: [Xen-devel] Essay on an important Xen decision (long)

> > Not only possible, but normal.  If there is a TLB miss and
> > a VHPT (virtual hashed page table) miss, software fills
> > both the TLB and VHPT.
> >
> > Is that what you meant?
> 
> Yep, that''s exactly what I thought happened :-)
> 
> IIRC, you said you don''t bother with the guest VHPT, right?  
> So presumably you 
> reflect TLB misses to the guest and intercept its TLB fill 
> instruction, apply 
> the P2M translation, then add it to *Xen*''s VHPT and fill the 
> TLB correctly?
> 
> I know I''ve followed some of these discussions before, just a 
> bit rusty now ;-)
Exactly... except for one nice shortcut that Matt Chapman
added.  Since the VHPT is architected and the guest is
expecting that it may be walked, when Xen intercepts the
initial TLB miss, it can first look in the guest VHPT
to resolve the miss (and add it to Xen''s VHPT and fill
the TLB) rather than reflect the TLB miss to the guest.
Only if the translation isn''t found in the guest VHPT
(or if looking for it -- a user_access -- causes another
TLB miss), then the TLB miss is reflected to the guest.

Thus, guests have the benefit not only of the hardware TLB
and Xen''s VHPT but also their own VHPT.

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Mark Williamson

2006-Jan-17 03:16 UTC

head link

Re: [Xen-devel] Essay on an important Xen decision (long)

> > I know I''ve followed some of these discussions before, just a
> > bit rusty now ;-)
>
> Exactly... except for one nice shortcut that Matt Chapman
> added.  Since the VHPT is architected and the guest is
> expecting that it may be walked, when Xen intercepts the
> initial TLB miss, it can first look in the guest VHPT
> to resolve the miss (and add it to Xen''s VHPT and fill
> the TLB) rather than reflect the TLB miss to the guest.
> Only if the translation isn''t found in the guest VHPT
> (or if looking for it -- a user_access -- causes another
> TLB miss), then the TLB miss is reflected to the guest.
>
> Thus, guests have the benefit not only of the hardware TLB
> and Xen''s VHPT but also their own VHPT.
I wondered if that''d be useful to do.  I guess Linux would naturally
try to
fill the VHPT eagerly as a performance optimisation, so this should work 
quite nicely - you''d only get the extra cost of reflecting the fault at
times
when even native Linux would have missed the VHPT.  Sweet!

And the real VHPT is per (logical) CPU?  I guess walking the guest VHPT 
additionally gives you (effectively) a VHPT per virtual processor, but the 
cost coming out of domain memory.  The fast-path VHPT in Xen doesn''t
need to
have such a high hit-rate as a result, I assume.

Had you evaluated the costs of having the guest explictly update Xen''s
VHPT?
(or at least hint that an update was necessary for some reason).

Cheers,
Mark

-- 
Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
Mark: To answer a question with a question: What use is a skateboard?
Dave: Skateboards have wheels.
Mark: My wheel has a wheel!

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2006-Jan-17 04:11 UTC

head link

RE: [Xen-devel] Essay on an important Xen decision (long)

>From: Mark Williamson [mailto:mark.williamson@cl.cam.ac.uk]
>Sent: 2006年1月17日 11:17
>
>I wondered if that''d be useful to do.  I guess Linux would
naturally try to
>fill the VHPT eagerly as a performance optimisation, so this should work
>quite nicely - you''d only get the extra cost of reflecting the
fault at times
>when even native Linux would have missed the VHPT.  Sweet!
>
>And the real VHPT is per (logical) CPU?  I guess walking the guest VHPT
>additionally gives you (effectively) a VHPT per virtual processor, but the
>cost coming out of domain memory.  The fast-path VHPT in Xen
doesn''t need to
>have such a high hit-rate as a result, I assume.
You capture the point there. ;-) Currently there''re two solutions
co-existing in current xen-ia64: per-LP(logical processor) VHPT/simplified vTLB
and per-VP VHPT(virtual processor)/hash vTLB. The former is used by dom0/domU
while the latter for domVTI. vTLB is the pool to track guest TLB related
insertion/purge operation, and thus behave like shadow to machine TLB.
Simplified vTLB means minimal architecture requirements with 8 DTR/ITR and 1
DTC/ITC and thus with less hit rate. Hash vTLB is a hash distributed table with
collision support with more memory required but higher hit rate. Currently
it''s more urgent to merge two solutions to be general, instead of the
strategy. To be per-VP or per-LP is discussed many times before, which is
actually not that obvious without a general solution and benchmark data
provided. We''ll have a discussion on this topic in tomorrow''s
summit.
>
>Had you evaluated the costs of having the guest explictly update
Xen''s VHPT?
>(or at least hint that an update was necessary for some reason).
To let guest explicitly update Xen''s VHPT has several obvious
limitations:
	- VHPT on IA64 has two format: short and long. To support different OS, Xen has
to construct a VHPT table with long format to support both from guest. Currently
linux is using short format VHPT, so it means a lot modification to xenlinux
operating xen''s long format VHPT directly.
	- It also conflict with current region id virtualization policy. On xen/ia64,
region id describes one address space which is virtualized with fewer bits to
xenlinux than ones machine actually supported. If xenlinux directly executes
hash algorithm on virtualized region id, it''s actually meaningless.

Thanks,
Kevin
>
>Cheers,
>Mark
>
>--
>Dave: Just a question. What use is a unicyle with no seat?  And no pedals!
>Mark: To answer a question with a question: What use is a skateboard?
>Dave: Skateboards have wheels.
>Mark: My wheel has a wheel!
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jan 2006 - Essay on an important Xen decision (long)

[Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

RE: [Xen-devel] Essay on an important Xen decision (long)

RE: [Xen-devel] Essay on an important Xen decision (long)

RE: [Xen-devel] Essay on an important Xen decision (long)

RE: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

RE: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

RE: [Xen-devel] Essay on an important Xen decision (long)

RE: [Xen-devel] Essay on an important Xen decision (long)

RE: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

RE: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

RE: [Xen-devel] Essay on an important Xen decision (long)

Re: [Xen-devel] Essay on an important Xen decision (long)

RE: [Xen-devel] Essay on an important Xen decision (long)