Magenheimer, Dan (HP Labs Fort Collins)
2006-Jan-10 19:26 UTC
[Xen-devel] Essay on an important Xen decision (long)
A fundamental architectural decision has to be made for Xen regarding handling of physical/machine memory; at a high level, the question is: Should Xen drivers be made more flexible to accommodate different approaches to managing physical memory, or should other architectures be required to conform to the Xen/x86 model? A more detailed description of the specific decision is below. The Xen/ia64 community would like to make this decision soon -- possibly at the Xen summit -- as next steps of Xen/ia64 functionality are significantly affected. Since either choice has an impact on common code and on future Xen architecture, this decision must involve core Xen developers and the broader Xen community rather than just Xen/ia64 developers. While this may seem to be a trivial matter, such fundamental choices often have a way of pre-selecting future design and implementation directions that can have major negative or positive impacts -- possibly unexpected -- on different parties. For example, a decision might make a Xen developers'' life easier but create headaches for a distro or a Linux maintainer. If nothing else, discussing fundamental decision points often helps to bring out and codify/document hidden assumptions about the future. This is a lengthy document but I hope to touch on most of the various issues and tradeoffs. Understanding -- or, at a minimum, reading -- this document should probably be a prerequisite for involvement in discussions to resolve this. I would encourage all readers to give the issues and tradeoffs some thought as the "obvious x86" answer may not be the best answer for the future of Xen. First a little terminology and background: In a virtualized environment, the resources of the physical machine must subdivided and/or shared between multiple virtual machines. Like an OS manages memory for its applications, one of the primary roles of a hypervisor is to provide the illusion to each guest OS that it owns some amount of "RAM" in the system. Thus there are two kinds of physical memory addresses: the addresses that a guest believes to be physical addresses and the addresses that actually refer to RAM (e.g. bus addresses). The literature (and Xen) confusingly labels these as "physical" addresses and "machine" addresses. In a virtualized environment, there must be some way of maintaining the relationship -- or "mapping" -- between physical addresses and machine addresses. In Xen (across all architectures), there are currently three different approaches for mapping physical addresses to machine addresses: 1) P==M: The guest is given a subset of machine memory that it can access "directly". Accesses to machine memory addresses outside of this range must somehow be restricted (but not necessarily disallowed) by Xen. 2) guest-aware p!=m (P2M): The guest is given max_pages of contiguous physical memory starting at zero and the knowledge that physical addresses are different than machine addresses. The guest must understand the difference between a physical address and a machine address and utilize the correct one in different situations. 3) virtual physical (VP): The guest is given max_pages of contiguous physical memory starting at zero. Xen provides the illusion to the guest that this is machine memory; any physical-to-machine translation required for functional correctness is handled invisibly by Xen. VP cannot be used by guests that directly program DMA-based I/O devices because a DMA device requires a machine address and, by definition, the guest knows only about physical addresses. Xen/x86 and Xen/x86_64 use P2M, but switch to VP (aka "shadow mode") for an unprivileged guest when a migration is underway. Xen/ia64 currently uses P==M for domain0 and VP for unprivileged guests. Xen/ppc intends to use VP only. There is an architectural proposal to change Xen/ia64 so that domain0 uses P2M instead of P==M. We will call this choice P2M and the choice to stay on the current path P==M. Here''s what I think are the key issues/tradeoffs: XEN CODE IMPACT Some Xen drivers, such as the blkif driver, have been "converted" to accommodate P==M. Others have not. For example, the balloon driver currently assumes domain0 is P2M and thus does not currently work on Xen/ia64 or Xen/ppc. The word "converted" is quoted because nobody is particularly satisfied with the current state of the converted drivers. Many apparently significant function calls are define''d out of existence by macros. Other code does radically different things depending on the architecture or on whether it is being executed by dom0 or an unprivileged domain. And a few ifdef''s are sprinkled about. In short, what''s done works but is an ugly hack. Some believe that the best way to solve this mess is for other architectures to do things more like Xen/x86. Others believe there is an advantage to defining clear abstractions and making the drivers truly more architecture-independent. P2M will require some rewriting of existing Xen/ia64 core code and the addition of significant changes to Xenlinux/ia64 code but will allow much easier porting of Xen''s balloon/networking/migration drivers and also enable some simplifying changes in the Xen block driver. It is fair to guess that it will take at least several weeks/months to rewrite and debug the core and Xenlinux code to get Xen/ia64 back to where it is today, but future driver work will be much faster. Fewer differences from Xen/x86 means less maintenance work for Xen core and Xen/ia64 developers. I''d imagine also that more code will be shared between Xen/VT-i and Xen/VT-x. P==M will require Xen''s balloon/networking/migration drivers to evolve to incorporate non-P2M models. This can be done, but is most likely to end up (at least in the short term) as a collection of unpalatable hacks like with the Xen block driver. However, making Xen drivers more tolerant of different approaches may be a good thing in the long run for Xen. XENLINUX IMPACT Today''s operating systems are not implemented with an understanding that a physical address and a machine address might be different. Building this awareness into an OS requires non-trivial source code change. For example, Xenlinux/x86 maintains a "p2m" mapping table for quick translation and provides a "m2p" hypercall to keep Xen in sync. OS code that manipulates physical addresses must be modified to access/manage this table and make hypercalls when appropriate. Macros can hide much of the complexity but much OS/driver code exists that does not use standard macros. There is some disagreement on how extensive are the required source code changes, and how difficult it will be to maintain these changes across future versions of guest OS''s. One illustrative example however: In paravirtualizing Xenlinux/ia64, seven header files are changed; it is closer to 40 for Xenlinux/x86. Related, some would assert that pushing a small number of changes into Linux (or any OS, open source or not) is far easier that pushing a large number of changes into Linux. Until all the Xen/x86 changes are in, it remains to be seen whether this is true or not. There is a reasonable concern that the broad review required for such an extensive set of changes will involve a large number of people with a large number of agendas and force a number of Xen design issues to be revisited -- at least clearly justified if not changed. This is especially true if Xen''s foes have any influence in the process. Transparent paravirtualization (also called "shared binary") is the ability for the same binary to be used both as a Xen guest and natively on real hardware. Xenlinux/ia64 currently support this; indeed, ignoring a couple of existing bugs, the same Xenlinux/ia64 binary can be used natively, and as domain0 and as an unprivileged domain. There have been proposals to do the same for Xenlinux/x86, but the degree of code changed is much much higher. There is debate about the cost/benefit of transparent paravirtualization, but the primary beneficiaries -- distros and end customers -- are not very well represented here. With P2M, it is unlikely that Xenlinux/ia64 will ever again be transparently paravirtualizable. As with Xenlinux/x86, the changes will probably be pushed into a subarch (mach-xen). Since Linux/ia64 has a more diverse set of subarch''s, there may be additional work to ensure that Xen is orthogonal (and thus works with) all the subarch''s. P==M would continue to allow transparent paravirtualization. This plus the reduced number of changes should make it easier to get Xen/ia64 support into Linux/ia64 (assuming Xen/x86 support gets included in Linux/x86). DRIVER DOMAINS Driver domains are "coming soon" and support of driver domains is a "must", however support for hybrid driver domains (i.e. domains that utilize both backend and frontend drivers) is open to debate. It can be assumed however that all driver domains will require DMA access. P2M should make driver domains easier to implement (once the initial Xenlinux/ia64 work is completed) and able to support a broader range of functionality. P==M may disallow hybrid driver domains and create other restrictions, though some creative person may be able to solve these. FUTURE XEN FEATURE SUPPORT None of the approaches have been "design-tested" significantly for support or compatibility with future Xen functionality such as oversubscription or machine-memory hot-plug, nor for exotic machine memory topologies such as NUMA or discontig (sparsely populated). Such functionalities and topologies are much more likely to be encountered in high-end server architectures rather than widely-available PCs and low-end servers. There is some debate as to whether the existing Xen memory architecture will easily evolve to accommodate these future changes or if more fundamental changes will be required. Architectural decisions and restrictions should be made with these uncertainties in mind. Some believe that discovery and policy for machine memory will eventually need to move out of Xen into domain0, leaving only enforcement mechanism in Xen. For example, oversubscription, NUMA or hot-plug memory support are likely to be fairly complicated and a commonly stated goal is to move unnecessary complexity out of Xen. And the plethora of recent changes in Linux/ia64 involving machine memory models indicates there are still many unknowns. P==M more easily supports a model where domain0 owns ALL of machine memory *except* a small amount reserved for and protected by Xen itself. If this is all true, Xen/x86 may eventually need to move to a dom0 P==M model, in which case it would be silly for Xen/ia64 to move to P2M and then back to P==M. Others think these features will be easy to implement in Xen and, with minor changes, entirely compatible with P2M. And that P2M is the once and future model for domain0. SUMMARY I''m sure there are more issues and tradeoffs that will come up in discussion, but let me summarize these: Move domain0 to P2M: + Fewer differences in Xen drivers between Xen/x86 and Xen/ia64 + Fewer differences in Xen drivers between Xen/VT-x and Xen/VT-i + Easier to implement remaining Xen drivers for Xen/ia64 - Major changes may require months for Xen/ia64 to regain stability - Many more changes to Xenlinux/ia64; more difficulty pushing upstream - No attempt to make Xen more resilient for future architectures Leave domain0 as P==M: + Fewer changes in Xenlinux; easier to push upstream + Making Xen more flexible is a good thing ? May provide better foundation for future features (oversubscr, NUMA) - More restrictions on driver domains - More hacks required for some Xen drivers, or - More work to better abstract and define a portable driver architecture abstract _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2006-Jan-10 19:34 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
Dan, Thanks for the summary, it''s nice to see all the arguments presented together.> 3) virtual physical (VP): The guest is given max_pages of > contiguous physical memory starting at zero. Xen provides > the illusion to the guest that this is machine memory; > any physical-to-machine translation required for functional > correctness is handled invisibly by Xen. VP cannot be used > by guests that directly program DMA-based I/O devices > because a DMA device requires a machine address and, by > definition, the guest knows only about physical addresses. > > Xen/x86 and Xen/x86_64 use P2M, but switch to VP (aka "shadow > mode") for an unprivileged guest when a migration is underway. > Xen/ia64 currently uses P==M for domain0 and VP for unprivileged > guests. Xen/ppc intends to use VP only.NB. the shadow mode for migration (logdirty) doesn''t actually virtualise the physical <-> machine mapping - a paravirt guest on x86 always knows where all its pages are in machine memory. All that''s being hidden in this case is that the pagetables are being shadowed (so that pages can be transparently write protected).> Driver domains are "coming soon" and support of driver domains is a > "must", however support for hybrid driver domains (i.e. domains that > utilize both backend and frontend drivers) is open to debate. It can > be assumed however that all driver domains will require DMA access. > > P2M should make driver domains easier to implement (once the initial > Xenlinux/ia64 work is completed) and able to support a broader range > of functionality. P==M may disallow hybrid driver domains and > create other restrictions, though some creative person may be able > to solve these.I''d think that driver domains themselves would be quite attractive on IA64 - for big boxes, it allows you to partition the hardware devices *and* potentially improve uptime by isolating driver faults. For what you call "hybrid" domains, there are people using this for virtual DMZ functionality... I guess it''d be nice to enable it. Presumably the problem is that the backend does some sort of P-to-M translation itself? Do you have a plan for how you would implement P==M driver domains? Cheers, Mark> FUTURE XEN FEATURE SUPPORT > > None of the approaches have been "design-tested" significantly for > support or compatibility with future Xen functionality such as > oversubscription or machine-memory hot-plug, nor for exotic > machine memory topologies such as NUMA or discontig (sparsely > populated). Such functionalities and topologies are much more > likely to be encountered in high-end server architectures rather > than widely-available PCs and low-end servers. There is some > debate as to whether the existing Xen memory architecture will easily > evolve to accommodate these future changes or if more fundamental > changes will be required. Architectural decisions and restrictions > should be made with these uncertainties in mind. > > Some believe that discovery and policy for machine memory will > eventually need to move out of Xen into domain0, leaving only > enforcement mechanism in Xen. For example, oversubscription, NUMA > or hot-plug memory support are likely to be fairly complicated > and a commonly stated goal is to move unnecessary complexity out > of Xen. And the plethora of recent changes in Linux/ia64 > involving machine memory models indicates there are still many > unknowns. P==M more easily supports a model where domain0 > owns ALL of machine memory *except* a small amount reserved for > and protected by Xen itself. If this is all true, Xen/x86 may > eventually need to move to a dom0 P==M model, in which case it > would be silly for Xen/ia64 to move to P2M and then back to P==M. > > Others think these features will be easy to implement in Xen and, > with minor changes, entirely compatible with P2M. And that > P2M is the once and future model for domain0. > > SUMMARY > > I''m sure there are more issues and tradeoffs that will come up > in discussion, but let me summarize these: > > Move domain0 to P2M: > + Fewer differences in Xen drivers between Xen/x86 and Xen/ia64 > + Fewer differences in Xen drivers between Xen/VT-x and Xen/VT-i > + Easier to implement remaining Xen drivers for Xen/ia64 > - Major changes may require months for Xen/ia64 to regain stability > - Many more changes to Xenlinux/ia64; more difficulty pushing upstream > - No attempt to make Xen more resilient for future architectures > > Leave domain0 as P==M: > + Fewer changes in Xenlinux; easier to push upstream > + Making Xen more flexible is a good thing > ? May provide better foundation for future features (oversubscr, NUMA) > - More restrictions on driver domains > - More hacks required for some Xen drivers, or > - More work to better abstract and define a portable driver > architecture abstract > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Jan-10 19:55 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
Hi Dan, Thanks for the thorough explaination of physical memory virtualization. It''s a topic that there isn''t a lot of good reference on. You seem to conclude that the only possible solutions are making the dom0 either P==M or P2M. Is it not possible to make dom0 VP? If the only issue for making dom0 VP is DMA, wouldn''t it be easier to modify the Linux DMA subsystem[1] to make a special hypercall to essentially pin a VP to a particular MFN that could be used for the DMA? One could imagine the hypervisor reversing low memory specifically for DMA such that bounce buffers could be avoided too. VP makes a lot of interesting memory optimizations considerably easier (memory compacting, swapping, etc.). [1] Realizing that I know very little about the Linux DMA subsystem so I don''t know if this is outside the realm of possibilities. Regards, Anthony Liguori Magenheimer, Dan (HP Labs Fort Collins) wrote:>A fundamental architectural decision has to be made for >Xen regarding handling of physical/machine memory; at a high >level, the question is: > > Should Xen drivers be made more flexible to accommodate > different approaches to managing physical memory, or > should other architectures be required to conform to > the Xen/x86 model? > >A more detailed description of the specific decision is below. >The Xen/ia64 community would like to make this decision soon -- >possibly at the Xen summit -- as next steps of Xen/ia64 >functionality are significantly affected. Since either choice >has an impact on common code and on future Xen architecture, >this decision must involve core Xen developers and the broader >Xen community rather than just Xen/ia64 developers. > >While this may seem to be a trivial matter, such fundamental >choices often have a way of pre-selecting future design and >implementation directions that can have major negative or positive >impacts -- possibly unexpected -- on different parties. For example, >a decision might make a Xen developers'' life easier but create >headaches for a distro or a Linux maintainer. If nothing else, >discussing fundamental decision points often helps to >bring out and codify/document hidden assumptions about >the future. > >This is a lengthy document but I hope to touch on most of >the various issues and tradeoffs. Understanding -- or, at >a minimum, reading -- this document should probably be >a prerequisite for involvement in discussions to resolve this. >I would encourage all readers to give the issues and tradeoffs >some thought as the "obvious x86" answer may not be the best >answer for the future of Xen. > >First a little terminology and background: > >In a virtualized environment, the resources of the physical >machine must subdivided and/or shared between multiple virtual >machines. Like an OS manages memory for its applications, one of >the primary roles of a hypervisor is to provide the illusion to >each guest OS that it owns some amount of "RAM" in the system. >Thus there are two kinds of physical memory addresses: the >addresses that a guest believes to be physical addresses and >the addresses that actually refer to RAM (e.g. bus addresses). >The literature (and Xen) confusingly labels these as "physical" >addresses and "machine" addresses. In a virtualized environment, >there must be some way of maintaining the relationship -- or >"mapping" -- between physical addresses and machine addresses. > >In Xen (across all architectures), there are currently three >different approaches for mapping physical addresses to machine >addresses: > >1) P==M: The guest is given a subset of machine memory that it > can access "directly". Accesses to machine memory addresses > outside of this range must somehow be restricted (but not > necessarily disallowed) by Xen. > >2) guest-aware p!=m (P2M): The guest is given max_pages of > contiguous physical memory starting at zero and the knowledge > that physical addresses are different than machine addresses. > The guest must understand the difference between a physical > address and a machine address and utilize the correct one in > different situations. > >3) virtual physical (VP): The guest is given max_pages of > contiguous physical memory starting at zero. Xen provides > the illusion to the guest that this is machine memory; > any physical-to-machine translation required for functional > correctness is handled invisibly by Xen. VP cannot be used > by guests that directly program DMA-based I/O devices > because a DMA device requires a machine address and, by > definition, the guest knows only about physical addresses. > >Xen/x86 and Xen/x86_64 use P2M, but switch to VP (aka "shadow >mode") for an unprivileged guest when a migration is underway. >Xen/ia64 currently uses P==M for domain0 and VP for unprivileged >guests. Xen/ppc intends to use VP only. > >There is an architectural proposal to change Xen/ia64 so that >domain0 uses P2M instead of P==M. We will call this choice P2M >and the choice to stay on the current path P==M. > >Here''s what I think are the key issues/tradeoffs: > >XEN CODE IMPACT > >Some Xen drivers, such as the blkif driver, have been "converted" >to accommodate P==M. Others have not. For example, the balloon driver >currently assumes domain0 is P2M and thus does not currently work >on Xen/ia64 or Xen/ppc. The word "converted" is quoted because >nobody is particularly satisfied with the current state of the >converted drivers. Many apparently significant function calls are >define''d out of existence by macros. Other code does radically >different things depending on the architecture or on whether it >is being executed by dom0 or an unprivileged domain. And a few >ifdef''s are sprinkled about. In short, what''s done works but is >an ugly hack. Some believe that the best way to solve this mess >is for other architectures to do things more like Xen/x86. Others >believe there is an advantage to defining clear abstractions and >making the drivers truly more architecture-independent. > >P2M will require some rewriting of existing Xen/ia64 core code and the >addition of significant changes to Xenlinux/ia64 code but will allow >much easier porting of Xen''s balloon/networking/migration drivers >and also enable some simplifying changes in the Xen block driver. >It is fair to guess that it will take at least several weeks/months >to rewrite and debug the core and Xenlinux code to get Xen/ia64 back >to where it is today, but future driver work will be much faster. >Fewer differences from Xen/x86 means less maintenance work for Xen >core and Xen/ia64 developers. I''d imagine also that more code will >be shared between Xen/VT-i and Xen/VT-x. > >P==M will require Xen''s balloon/networking/migration drivers to >evolve to incorporate non-P2M models. This can be done, but is most >likely to end up (at least in the short term) as a collection of >unpalatable hacks like with the Xen block driver. However, making >Xen drivers more tolerant of different approaches may be a good >thing in the long run for Xen. > >XENLINUX IMPACT > >Today''s operating systems are not implemented with an understanding >that a physical address and a machine address might be different. >Building this awareness into an OS requires non-trivial source >code change. For example, Xenlinux/x86 maintains a "p2m" mapping >table for quick translation and provides a "m2p" hypercall to keep >Xen in sync. OS code that manipulates physical addresses must be >modified to access/manage this table and make hypercalls when >appropriate. Macros can hide much of the complexity but much OS/driver >code exists that does not use standard macros. There is some >disagreement on how extensive are the required source code changes, >and how difficult it will be to maintain these changes across future >versions of guest OS''s. One illustrative example however: In >paravirtualizing Xenlinux/ia64, seven header files are changed; >it is closer to 40 for Xenlinux/x86. > >Related, some would assert that pushing a small number of changes into >Linux (or any OS, open source or not) is far easier that pushing a >large number of changes into Linux. Until all the Xen/x86 changes are >in, it remains to be seen whether this is true or not. There is >a reasonable concern that the broad review required for such >an extensive set of changes will involve a large number of people >with a large number of agendas and force a number of Xen design >issues to be revisited -- at least clearly justified if not changed. >This is especially true if Xen''s foes have any influence in the >process. > >Transparent paravirtualization (also called "shared binary") is the >ability for the same binary to be used both as a Xen guest and >natively on real hardware. Xenlinux/ia64 currently support this; >indeed, ignoring a couple of existing bugs, the same Xenlinux/ia64 >binary can be used natively, and as domain0 and as an unprivileged >domain. There have been proposals to do the same for Xenlinux/x86, >but the degree of code changed is much much higher. There is debate >about the cost/benefit of transparent paravirtualization, but the >primary beneficiaries -- distros and end customers -- are not very >well represented here. > >With P2M, it is unlikely that Xenlinux/ia64 will ever again be >transparently paravirtualizable. As with Xenlinux/x86, the changes >will probably be pushed into a subarch (mach-xen). Since Linux/ia64 >has a more diverse set of subarch''s, there may be additional work >to ensure that Xen is orthogonal (and thus works with) all the >subarch''s. > >P==M would continue to allow transparent paravirtualization. >This plus the reduced number of changes should make it easier to >get Xen/ia64 support into Linux/ia64 (assuming Xen/x86 support >gets included in Linux/x86). > >DRIVER DOMAINS > >Driver domains are "coming soon" and support of driver domains is a >"must", however support for hybrid driver domains (i.e. domains that >utilize both backend and frontend drivers) is open to debate. It can >be assumed however that all driver domains will require DMA access. > >P2M should make driver domains easier to implement (once the initial >Xenlinux/ia64 work is completed) and able to support a broader range >of functionality. P==M may disallow hybrid driver domains and >create other restrictions, though some creative person may be able >to solve these. > >FUTURE XEN FEATURE SUPPORT > >None of the approaches have been "design-tested" significantly for >support or compatibility with future Xen functionality such as >oversubscription or machine-memory hot-plug, nor for exotic >machine memory topologies such as NUMA or discontig (sparsely >populated). Such functionalities and topologies are much more >likely to be encountered in high-end server architectures rather >than widely-available PCs and low-end servers. There is some >debate as to whether the existing Xen memory architecture will easily >evolve to accommodate these future changes or if more fundamental >changes will be required. Architectural decisions and restrictions >should be made with these uncertainties in mind. > >Some believe that discovery and policy for machine memory will >eventually need to move out of Xen into domain0, leaving only >enforcement mechanism in Xen. For example, oversubscription, NUMA >or hot-plug memory support are likely to be fairly complicated >and a commonly stated goal is to move unnecessary complexity out >of Xen. And the plethora of recent changes in Linux/ia64 >involving machine memory models indicates there are still many >unknowns. P==M more easily supports a model where domain0 >owns ALL of machine memory *except* a small amount reserved for >and protected by Xen itself. If this is all true, Xen/x86 may >eventually need to move to a dom0 P==M model, in which case it >would be silly for Xen/ia64 to move to P2M and then back to P==M. > >Others think these features will be easy to implement in Xen and, >with minor changes, entirely compatible with P2M. And that >P2M is the once and future model for domain0. > >SUMMARY > >I''m sure there are more issues and tradeoffs that will come up >in discussion, but let me summarize these: > >Move domain0 to P2M: >+ Fewer differences in Xen drivers between Xen/x86 and Xen/ia64 >+ Fewer differences in Xen drivers between Xen/VT-x and Xen/VT-i >+ Easier to implement remaining Xen drivers for Xen/ia64 >- Major changes may require months for Xen/ia64 to regain stability >- Many more changes to Xenlinux/ia64; more difficulty pushing upstream >- No attempt to make Xen more resilient for future architectures > >Leave domain0 as P==M: >+ Fewer changes in Xenlinux; easier to push upstream >+ Making Xen more flexible is a good thing >? May provide better foundation for future features (oversubscr, NUMA) >- More restrictions on driver domains >- More hacks required for some Xen drivers, or >- More work to better abstract and define a portable driver > architecture abstract > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hollis Blanchard
2006-Jan-10 23:02 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
On Tue, 2006-01-10 at 11:26 -0800, Magenheimer, Dan (HP Labs Fort Collins) wrote:> > 1) P==M: The guest is given a subset of machine memory that it > can access "directly". Accesses to machine memory addresses > outside of this range must somehow be restricted (but not > necessarily disallowed) by Xen. > > 2) guest-aware p!=m (P2M): The guest is given max_pages of > contiguous physical memory starting at zero and the knowledge > that physical addresses are different than machine addresses. > The guest must understand the difference between a physical > address and a machine address and utilize the correct one in > different situations. > > 3) virtual physical (VP): The guest is given max_pages of > contiguous physical memory starting at zero. Xen provides > the illusion to the guest that this is machine memory; > any physical-to-machine translation required for functional > correctness is handled invisibly by Xen. VP cannot be used > by guests that directly program DMA-based I/O devices > because a DMA device requires a machine address and, by > definition, the guest knows only about physical addresses. > > Xen/x86 and Xen/x86_64 use P2M, but switch to VP (aka "shadow > mode") for an unprivileged guest when a migration is underway. > Xen/ia64 currently uses P==M for domain0 and VP for unprivileged > guests. Xen/ppc intends to use VP only. > > There is an architectural proposal to change Xen/ia64 so that > domain0 uses P2M instead of P==M. We will call this choice P2M > and the choice to stay on the current path P==M.So ia64 dom0 physical 0 is machine 0? Where does Xen live in machine space? PowerPC exception handlers are architecturally hardcoded to the first couple pages of memory, so Xen needs to live there. Linux expects it is booting at 0 of course, so dom0 runs in an offset physical address space. The trouble then comes when dom0 needs to access IO or domU memory; obviously dom0 must have some awareness of the machine space. Accordingly, I''m thinking I''m going to need to install p2m tables in dom0, and once they''re there, why not have domU use them too? -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Magenheimer, Dan (HP Labs Fort Collins)
2006-Jan-11 00:13 UTC
RE: [Xen-devel] Essay on an important Xen decision (long)
> NB. the shadow mode for migration (logdirty) doesn''t actually > virtualise the > physical <-> machine mapping - a paravirt guest on x86 always > knows where all > its pages are in machine memory. All that''s being hidden in > this case is > that the pagetables are being shadowed (so that pages can be > transparently > write protected).Thanks for the clarification!> I''d think that driver domains themselves would be quite > attractive on IA64 - > for big boxes, it allows you to partition the hardware devices *and* > potentially improve uptime by isolating driver faults.Probably true, but I think most "big box" customers are looking for partition isolation beyond what is possible with Xen (at least near-term).> For what you call "hybrid" domains, there are people using > this for virtual > DMZ functionality... I guess it''d be nice to enable it. > Presumably the > problem is that the backend does some sort of P-to-M > translation itself? > > Do you have a plan for how you would implement P==M driver domains?Only roughly. Detailed design and implementation was to wait until after driver domain support gets back into Xen/x86 (and until after this P?M decision is made). Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Magenheimer, Dan (HP Labs Fort Collins)
2006-Jan-11 00:22 UTC
RE: [Xen-devel] Essay on an important Xen decision (long)
> You seem to conclude that the only possible solutions are making the > dom0 either P==M or P2M. Is it not possible to make dom0 VP? > > If the only issue for making dom0 VP is DMA, wouldn''t it be easier to > modify the Linux DMA subsystem[1] to make a special hypercall to > essentially pin a VP to a particular MFN that could be used for the > DMA? One could imagine the hypervisor reversing low memory > specifically > for DMA such that bounce buffers could be avoided too. > [1] Realizing that I know very little about the Linux DMA > subsystem so I > don''t know if this is outside the realm of possibilities.Technically, if the guest source needs to be changed so that some code deals with physical addresses and other code deals with machine addresses, I would call that a flavor of P2M. If the "DMA subsystem" is the only place where the mapping needs to be done and the affected code can be cleanly isolated, your suggestion is a good one. I''m no expert on Linux DMA code either, but I believe it isn''t very clean.> VP makes a lot of interesting memory optimizations > considerably easier > (memory compacting, swapping, etc.).Yes, definitely, and oversubscription, different kinds of migration, NUMA physical memory affinity migration, etc. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Magenheimer, Dan (HP Labs Fort Collins)
2006-Jan-11 00:39 UTC
RE: [Xen-devel] Essay on an important Xen decision (long)
> So ia64 dom0 physical 0 is machine 0? Where does Xen live in machine > space? > > PowerPC exception handlers are architecturally hardcoded to the first > couple pages of memory, so Xen needs to live there. Linux > expects it is > booting at 0 of course, so dom0 runs in an offset physical address > space.On ia64, Xen (and Linux when booting natively) is relocatable. Machine address 0 is not special on ia64 like it is on PowerPC.> The trouble then comes when dom0 needs to access IO or domU memory; > obviously dom0 must have some awareness of the machine space. > Accordingly, I''m thinking I''m going to need to install p2m tables in > dom0, and once they''re there, why not have domU use them too?On ia64, machine memory is exposed to a native OS via EFI (firmware) tables. (I think these are similar to e820 on x86 machines and don''t know how this is done on PowerPC.) When Xen/ia64 starts domain0 (or a domU), it passes a faked EFI table. This table is faked differently for domain0 and domU''s. One solution, for example, would be for Xen to "give" all machine memory to dom0, protecting only a small portion for itself. Then when other domains are created, all the memory for domUs would be "ballooned" from dom0. Per the previous exchange with Anthony, there are many advantages to being able to move memory around invisibly to domains, which is easy with VP and much harder with P2M. The current debate on Xen/ia64 is just for domain0 but it could expand... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2006-Jan-11 07:56 UTC
RE: [Xen-devel] Essay on an important Xen decision (long)
>From: Magenheimer, Dan >Sent: 2006年1月11日 3:26Hi, Dan, Good background for discussion.>[...] >an ugly hack. Some believe that the best way to solve this mess >is for other architectures to do things more like Xen/x86. Others >believe there is an advantage to defining clear abstractions and >making the drivers truly more architecture-independent.I would say two options above don''t conflict actually. ;-) Move to Xen/x86 for things really common with clearer abstraction for architecture difference. We need carefully differentiate which part of mess really comes from arch reason, and which part is common but simply missed due to early quick bring-up requirement. I don''t think this is enough cared by far. Xen, as a well-formed product, needs to have common policies and common features on all architectures. Maybe, to implement same features can be more difficult and even bring some performance impact on some architecture, but it''s a must-to-have requirement from customer''s point of view if customer acknowledges it. Just raise it here as an important factor when considering the final solution cross-architecture.>[...] >XENLINUX IMPACT > >Xen in sync. OS code that manipulates physical addresses must be >modified to access/manage this table and make hypercalls when >appropriate. Macros can hide much of the complexity but much OS/driver >code exists that does not use standard macros. There is someThis seems to be an issue for driver modules to be re-compiled... ;-(>Transparent paravirtualization (also called "shared binary") is the >ability for the same binary to be used both as a Xen guest and >natively on real hardware. Xenlinux/ia64 currently support this; >indeed, ignoring a couple of existing bugs, the same Xenlinux/ia64 >binary can be used natively, and as domain0 and as an unprivileged >domain. There have been proposals to do the same for Xenlinux/x86, >but the degree of code changed is much much higher. There is debate >about the cost/benefit of transparent paravirtualization, but the >primary beneficiaries -- distros and end customers -- are not very >well represented here.Transparent is welcomed, which however doesn''t mean conservative self-restriction upon modification to xenlinux. Transparent with good performance is the goal to pursue, though xenlinux/x86 does need more efforts to make it happen.> >With P2M, it is unlikely that Xenlinux/ia64 will ever again be >transparently paravirtualizable. As with Xenlinux/x86, the changes >will probably be pushed into a subarch (mach-xen).First sub-arch, and further a configurable feature later with negligible impact to native running? ;-)>[...] > >Some believe that discovery and policy for machine memory will >eventually need to move out of Xen into domain0, leaving only >enforcement mechanism in Xen. For example, oversubscription, NUMA >or hot-plug memory support are likely to be fairly complicated >and a commonly stated goal is to move unnecessary complexity out >of Xen. And the plethora of recent changes in Linux/ia64 >involving machine memory models indicates there are still many >unknowns. P==M more easily supports a model where domain0 >owns ALL of machine memory *except* a small amount reserved for >and protected by Xen itself. If this is all true, Xen/x86 may >eventually need to move to a dom0 P==M model, in which case it >would be silly for Xen/ia64 to move to P2M and then back to P==M.I don''t think it''s a good design choice by complete takeover to dom0. Moving ownership to dom0 doesn’t mean a simple move, since memory sub-system is the core/base of Xen. Extra context switches are added for any page related operation. Also by P==M model, how do you ensure a scalable allocation environment after a long run? Any activities within dom0 which consumes Physical frames, thus actually eats Machine frames. Security will be another issue though I can''t come out a clear example immediately...> >SUMMARY >[...]This summary is good. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Gerd Hoffmann
2006-Jan-11 09:33 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
Hi,> If the only issue for making dom0 VP is DMA, wouldn''t it be easier to > modify the Linux DMA subsystem[1] to make a special hypercall to > essentially pin a VP to a particular MFN that could be used for the > DMA?Linux has a nice API for DMA memory management, see Documentation/DMA-mapping.txt. Basically you pass in a "struct page" and a offset (within that page) and get back a dma address you can pass on to your hardware. That is required for some architectures where phyical addresses (as seen by the CPU) and bus addresses (as seen by the pci devices) are not identical. It''s also needed on archs which have an iommu to create/delete mapping entries there. I think that API should do just fine for any DMA transfer dom0 wants to do for its own pages. xenlinux would simply need a special implementation of that API which calls xen to translate the VP address into a dma address (usually same as machine address). Probably xen must also handle a iommu (if present) to ensure secure dma once we have hardware which supports this. A bit more tricky are DMA transfers for _other_ domains (i.e. what the blkback driver has to do). blkback maps the foreign domain pages into its own address space, and I think there is no way around that right now API-wise as otherwise there isn''t a "struct page" for the page ... cheers, Gerd -- Gerd ''just married'' Hoffmann <kraxel@suse.de> I''m the hacker formerly known as Gerd Knorr. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jan-11 10:08 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
On 10 Jan 2006, at 19:55, Anthony Liguori wrote:> You seem to conclude that the only possible solutions are making the > dom0 either P==M or P2M. Is it not possible to make dom0 VP? > > If the only issue for making dom0 VP is DMA, wouldn''t it be easier to > modify the Linux DMA subsystem[1] to make a special hypercall to > essentially pin a VP to a particular MFN that could be used for the > DMA? One could imagine the hypervisor reversing low memory > specifically for DMA such that bounce buffers could be avoided too. > > VP makes a lot of interesting memory optimizations considerably easier > (memory compacting, swapping, etc.).On an architecture where VP is cheaper to implement than on x86, it may well make sense to do that in preference to P2M. As you say, it makes certain future extensions less of a pain to implement. If ia64 does decide to back off from the P==M route then I suspect VP is the way to go (which is I think how ia64 domU''s currently work anyway). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tristan Gingold
2006-Jan-11 10:46 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
Le Mardi 10 Janvier 2006 20:55, Anthony Liguori a écrit :> Hi Dan, > > Thanks for the thorough explaination of physical memory virtualization. > It''s a topic that there isn''t a lot of good reference on. > > You seem to conclude that the only possible solutions are making the > dom0 either P==M or P2M. Is it not possible to make dom0 VP? > > If the only issue for making dom0 VP is DMA, wouldn''t it be easier to > modify the Linux DMA subsystem[1] to make a special hypercall to > essentially pin a VP to a particular MFN that could be used for the > DMA? One could imagine the hypervisor reversing low memory specifically > for DMA such that bounce buffers could be avoided too. > > VP makes a lot of interesting memory optimizations considerably easier > (memory compacting, swapping, etc.). > > [1] Realizing that I know very little about the Linux DMA subsystem so I > don''t know if this is outside the realm of possibilities.Hi, a few years ago (it was with linux 2.2), I wrote device drivers for rather complex hardware. DMA subsystem didn''t really exist. The main reason is an hardware reason: DMA chip do not exist anymore because nowaday (almost since PCI) every driver chip do DMA by itself. Tristan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Harry Butterworth
2006-Jan-11 13:37 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
On Tue, 2006-01-10 at 11:26 -0800, Magenheimer, Dan (HP Labs Fort Collins) wrote:> A fundamental architectural decision has to be made for > Xen regarding handling of physical/machine memory; at a high > level, the question is: > > Should Xen drivers be made more flexible to accommodate > different approaches to managing physical memory, or > should other architectures be required to conform to > the Xen/x86 model?I believe the right approach is to decouple the driver implementation from the memory management architecture by defining a high level API to build the drivers on. The API should be expressed in terms of the operations that the drivers need to perform rather than in terms of the underlying primitives that are actually used to perform those operations. Such an API would allow decisions about memory management to be made independent of the drivers and would allow the memory management architecture to be changed relatively easily at a later date since the resulting damage would be contained within the core library that implemented the driver infrastructure API. I think this is the right approach because: o - Decoupling the drivers from the memory management architecture reduces the cost of future memory management architecture changes and keeps our options open, so is a lower risk approach than choosing a memory management architecture now and trying to stick with it. o - A good high level driver infrastructure API will clean up the drivers considerably. o - Containing the code which performs low-level memory manipulations within a core driver infrastructure library written by an expert will result in higher overall quality across all the drivers. o - As a driver author, given a high level driver infrastructure API which decouples me from the memory management architecture, the choice of P==M, P2M or VP is no longer my concern. I have made a first attempt at defining a high level driver infrastructure API for general use by xen split drivers. This is the xenidc API and, whilst it is designed for general use, it currently has one client: the split USB driver. I believe that xenidc completely decouples its clients from the memory management architecture such that, for example, there should be no changes required in the USB driver code when porting it from x86 to ia64 and PPC (this will be true whether or not the memory management architecture for those platforms is changed to be more like x86). All required changes ought to be contained within the xenidc implementation and therefore would only need to be implemented once for all clients of xenidc. The choice of a common memory management architecture or different memory management architectures across platforms or different options for memory management architectures for a particular platform or different options for memory management architecture at run-time for transparent virtualization can all be contained within the xenidc implementation. In addition to decoupling the client driver code from the memory management architecture, the xenidc API provides: o - Convenient inter-domain communication primitives which encapsulate the rather complicated state machine required for correct set-up and tear down of inter-domain communication channels for (un)loadable driver modules. o - A convenient inter-domain bulk transport. o - An up-front-reservation resource management strategy. o - Driver forwards-compatibility with a network transparent xenidc implementation. I have attached the latest xenidc patch which includes documentation of the xenidc API (added by the patch to the Xen interface document). I have also attached the latest USB patch as an example of a client of the xenidc API. (Since the last time I posted these patches I have fixed a couple of compiler warnings for the X86_64 build). A few points to note: o - xenidc is an infrastructure for the Xen-specific split drivers. Xenidc doesn''t directly address the issue of making the native drivers work correctly under virtualization but does allow you to do that however you like across different architectures whilst maintaining common code for all the split drivers. o - This is just a first attempt which I wrote mainly to decouple the USB driver from churn in the underlying infrastructure. The API is generally useful but only covers the operations that were actually required for the USB driver. There is already enough in the API to base other drivers on it but the API would need to be fleshed out with some different kinds of operations before it would be possible to implement all drivers with the same efficient primitives that are used today. o - Unfortunately I didn''t get funding to attend the Xen summit so I won''t be there to present on Xenidc. I''m not concerned about whether xenidc gets accepted as-is but I do hope it will be useful as an example of the kind of API that we could have. I''ll be happy to answer any questions on the list. Harry. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Jan-11 16:22 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
Gerd Hoffmann wrote:> Hi, > >> If the only issue for making dom0 VP is DMA, wouldn''t it be easier to >> modify the Linux DMA subsystem[1] to make a special hypercall to >> essentially pin a VP to a particular MFN that could be used for the DMA? > > > Linux has a nice API for DMA memory management, see > Documentation/DMA-mapping.txt. Basically you pass in a "struct page" > and a offset (within that page) and get back a dma address you can > pass on to your hardware. That is required for some architectures > where phyical addresses (as seen by the CPU) and bus addresses (as > seen by the pci devices) are not identical. It''s also needed on archs > which have an iommu to create/delete mapping entries there. > > I think that API should do just fine for any DMA transfer dom0 wants > to do for its own pages. xenlinux would simply need a special > implementation of that API which calls xen to translate the VP address > into a dma address (usually same as machine address). Probably xen > must also handle a iommu (if present) to ensure secure dma once we > have hardware which supports this.Excellent, thanks for the reference!> > A bit more tricky are DMA transfers for _other_ domains (i.e. what the > blkback driver has to do). blkback maps the foreign domain pages into > its own address space, and I think there is no way around that right > now API-wise as otherwise there isn''t a "struct page" for the page ...There are, of course, other ways around this. One could have a hypervisor level DMA API that allowed bulk transfer of memory between domains (either by copying or page flipping depending the size of the buffer). Another option would be a separate pool of sharable memory that could be mapped appropriately into a domain''s VPM space. Regards, Anthony Liguori> cheers, > > Gerd >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2006-Jan-11 16:22 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
> Is VP on x86 expensive in terms of performance or complexity?One nasty thing for VP on x86 is the compulsory hardware PT walker - IA64 allows the hypervisor to handle TLB fills on behalf of a guest, so that it can perform phys-to-machine translation. IA64 has a hardware PT walker but you aren''t *forced* to use it. IIRC, PPC also performs P-to-M translations in the hypervisor, but I vaguely recall that happening during an explicit pagetable update hypercall - kind of a middle road between the x86 and IA64 approaches... Some PPC guy may jump in and correct me at this point, though ;-)> I imagine that you would have to always have shadow paging enable but > you could still do bulk updates ala writable page tables so the > performance cost should be minimal I would think. > > Trying to understand the memory system in more details so any additional > info is much appreciate :-)I don''t see why that couldn''t perform decently, although it''d have more overhead than allowing the guest to manage its pagetables directly... I *thought* this was intended to be supported at some point, but I''m not sure if it''s been needed yet. Others may have more concrete numbers for the performance - I think writable PTs got benchmarked against shadowing at some point. Cheers, Mark> Thanks, > > Anthony Liguori > > > If ia64 does decide to back off from the P==M route then I suspect VP > > is the way to go (which is I think how ia64 domU''s currently work > > anyway). > > > > -- Keir > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel--> Just a question. What use is a unicyle with no seat? And no pedals!Me: To answer a question with a question: What use is a skateboard?> Skateboards have wheels.Me: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Jan-11 16:25 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
Hi Keir, Keir Fraser wrote:> > On an architecture where VP is cheaper to implement than on x86, it > may well make sense to do that in preference to P2M. As you say, it > makes certain future extensions less of a pain to implement.Is VP on x86 expensive in terms of performance or complexity? I imagine that you would have to always have shadow paging enable but you could still do bulk updates ala writable page tables so the performance cost should be minimal I would think. Trying to understand the memory system in more details so any additional info is much appreciate :-) Thanks, Anthony Liguori> If ia64 does decide to back off from the P==M route then I suspect VP > is the way to go (which is I think how ia64 domU''s currently work > anyway). > > -- Keir > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2006-Jan-11 16:38 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
On 11 Jan 2006, at 16:25, Anthony Liguori wrote:>> On an architecture where VP is cheaper to implement than on x86, it >> may well make sense to do that in preference to P2M. As you say, it >> makes certain future extensions less of a pain to implement. > > Is VP on x86 expensive in terms of performance or complexity? > > I imagine that you would have to always have shadow paging enable but > you could still do bulk updates ala writable page tables so the > performance cost should be minimal I would think. > > Trying to understand the memory system in more details so any > additional info is much appreciate :-)Shadow page tables do have a measurable overhead, although it''s not *that* big for most workloads. We already support a shadow-translate mode (well, the xenlinux support for it may be broken right now, but it''s worked in the past) for paravirt guests and various people researching new xen features want to make use of that. I can imagine that we will support both modes even in x86 at some point in the future, and users can make the features/performance tradeoff. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Jan-11 16:41 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
Mark Williamson wrote:>>I imagine that you would have to always have shadow paging enable but >>you could still do bulk updates ala writable page tables so the >>performance cost should be minimal I would think. >> >>Trying to understand the memory system in more details so any additional >>info is much appreciate :-) >> >> > >I don''t see why that couldn''t perform decently, although it''d have more >overhead than allowing the guest to manage its pagetables directly... I >*thought* this was intended to be supported at some point, but I''m not sure >if it''s been needed yet. Others may have more concrete numbers for the >performance - I think writable PTs got benchmarked against shadowing at some >point. > >Just to be thorough, was the shadow paging code a "pure" shadow page table where ever PTE write trapped to the hypervisor or were bulk PMD updates sent to the hypervisor? I''m surprised there would be a measurable difference with shadow paging as it should only require a potential allocation (which could be fast pathed) and in the normal case, a couple extra reads/writes. I would think that cost would be overshadowed by the original cost of the context switch. Of course, I guess it wouldn''t be that much of a shock to me that the overhead is at least measurable... Regards, Anthony Liguori>Cheers, >Mark > > > >>Thanks, >> >>Anthony Liguori >> >> >> >>>If ia64 does decide to back off from the P==M route then I suspect VP >>>is the way to go (which is I think how ia64 domU''s currently work >>>anyway). >>> >>> -- Keir >>> >>> >>_______________________________________________ >>Xen-devel mailing list >>Xen-devel@lists.xensource.com >>http://lists.xensource.com/xen-devel >> >> > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Just to be thorough, was the shadow paging code a "pure" > shadow page table where ever PTE write trapped to the > hypervisor or were bulk PMD updates sent to the hypervisor?All of Xen''s pagetable options are able to do high-performance bulk updates (though its actually typically more important to optimize for the demand-fault path). There was some quite extensive benchmarking done ~9 months back, and we''re hoping to write it up and submit it somewhere. The algorithms have evolved a bit since so we need to rerun things.> I''m surprised there would be a measurable difference with > shadow paging as it should only require a potential > allocation (which could be fast > pathed) and in the normal case, a couple extra reads/writes. > I would think that cost would be overshadowed by the original > cost of the context switch.Hint: you need to be propagate dirty and accessed bits back to the guest pagetable.> Of course, I guess it wouldn''t be that much of a shock to me > that the overhead is at least measurable...It''s certainly measureable, and certainly dominates the virtualization overhead of some workloads. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Anthony Liguori
2006-Jan-11 17:38 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
Ian Pratt wrote:>Hint: you need to be propagate dirty and accessed bits back to the guest >pagetable. > >Ahh, I see now. Thanks :-) Regards, Anthony Liguori>>Of course, I guess it wouldn''t be that much of a shock to me >>that the overhead is at least measurable... >> >> > >It''s certainly measureable, and certainly dominates the virtualization >overhead of some workloads. > > >Ian > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hollis Blanchard
2006-Jan-11 21:16 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
On Wed, 2006-01-11 at 16:22 +0000, Mark Williamson wrote:> IIRC, PPC also performs P-to-M translations in the hypervisor, but I vaguely > recall that happening during an explicit pagetable update hypercall - kind of > a middle road between the x86 and IA64 approaches... Some PPC guy may jump > in and correct me at this point, though ;-)It''s pretty simple: for Xen/x86, the kernel does translation and the hypervisor does validation. For PAPR on PPC hardware, the hypervisor does both translation and validation. This is done for every mapping hcall: the domain makes an hcall to map physical address P, and the hypervisor translates to machine address M and allows or rejects the request. Page fault exceptions are delivered by the processor to the domain (not the hypervisor), which reacts by making a mapping hcall. -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hollis Blanchard
2006-Jan-11 21:36 UTC
RE: [Xen-devel] Essay on an important Xen decision (long)
On Tue, 2006-01-10 at 16:39 -0800, Magenheimer, Dan (HP Labs Fort Collins) wrote:> > So ia64 dom0 physical 0 is machine 0? Where does Xen live in machine > > space? > > > > PowerPC exception handlers are architecturally hardcoded to the first > > couple pages of memory, so Xen needs to live there. Linux > > expects it is > > booting at 0 of course, so dom0 runs in an offset physical address > > space. > > On ia64, Xen (and Linux when booting natively) is relocatable. > Machine address 0 is not special on ia64 like it is on PowerPC.Right, so P==M for dom0 (or any domain) will not work on PowerPC.> Per the previous exchange with Anthony, there are many advantages > to being able to move memory around invisibly to domains, which > is easy with VP and much harder with P2M. The current debate on > Xen/ia64 is just for domain0 but it could expand...As far as I can see, dom0 must be aware of the machine address space, so that means P2M for PowerPC. dom0 is a special case: do you really need to worry about migrating dom0, or memory compacting with other domains? As for the question of domU being VP or P2M, I see no reason it shouldn''t be VP. IO-capable domUs (driver domains) could be VP with proper IOMMU support. The PowerPC PAPR and Xen/ia64 implementations demonstrate that this works... -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Magenheimer, Dan (HP Labs Fort Collins)
2006-Jan-12 00:48 UTC
RE: [Xen-devel] Essay on an important Xen decision (long)
> > On ia64, Xen (and Linux when booting natively) is relocatable. > > Machine address 0 is not special on ia64 like it is on PowerPC. > > Right, so P==M for dom0 (or any domain) will not work on PowerPC.Are machine addresses 0-n the only range that are special? And can one safely assume that DMA will never occur in this range? If so, then a single "special" mapping in the hypervisor could get around this. While I suppose this is more P~=M than strictly P==M, it would seem a reasonable alternative to major Linux changes.> > Per the previous exchange with Anthony, there are many advantages > > to being able to move memory around invisibly to domains, which > > is easy with VP and much harder with P2M. The current debate on > > Xen/ia64 is just for domain0 but it could expand... > > As far as I can see, dom0 must be aware of the machine > address space, so > that means P2M for PowerPC. dom0 is a special case: do you really need > to worry about migrating dom0, or memory compacting with > other domains?No, migrating dom0 or any driver domain with direct device access is unreasonable, at least unless all device access is virtualized (e.g. Infiniband?). I view domain0 as closer to a semi-privileged extension of Xen. Not sure what you mean by memory compacting...> As for the question of domU being VP or P2M, I see no reason it > shouldn''t be VP. IO-capable domUs (driver domains) could be VP with > proper IOMMU support. The PowerPC PAPR and Xen/ia64 implementations > demonstrate that this works...Ignoring the page table problems on x86 (which Vmware demonstrates is more of a performance issue than a functional issue), if DMA can be invisibly handled, I think everyone agrees that VP has significant advantages over either P==M or P2M. But to clarify, Xen/ia64 domU is currently VP only because it doesn''t do DMA. Driver domains will complicate this. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2006-Jan-12 02:44 UTC
RE: [Xen-devel] Essay on an important Xen decision (long)
>From: Mark Williamson >Sent: 2006年1月12日 0:23 > >> Is VP on x86 expensive in terms of performance or complexity? > >One nasty thing for VP on x86 is the compulsory hardware PT walker - IA64 >allows the hypervisor to handle TLB fills on behalf of a guest, so that it >can perform phys-to-machine translation. IA64 has a hardware PT walker but >you aren''t *forced* to use it.To make it clearer, this hardware PT walker on IA64 is not like normal multi-level PT walker on x86. Instead it can be a virtually linear table, or a hash table which is configurable. ;-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2006-Jan-16 15:52 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
> To make it clearer, this hardware PT walker on IA64 is not like normal > multi-level PT walker on x86. Instead it can be a virtually linear table, > or a hash table which is configurable. ;-)Am I right in thinking it''s also possible to implement "software filled TLB" on IA64? (as a fallback for when the hardware assist fails?). Cheers, Mark -- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Magenheimer, Dan (HP Labs Fort Collins)
2006-Jan-16 22:56 UTC
RE: [Xen-devel] Essay on an important Xen decision (long)
> > To make it clearer, this hardware PT walker on IA64 is not > like normal > > multi-level PT walker on x86. Instead it can be a virtually > linear table, > > or a hash table which is configurable. ;-) > > Am I right in thinking it''s also possible to implement > "software filled TLB" > on IA64? (as a fallback for when the hardware assist fails?).Not only possible, but normal. If there is a TLB miss and a VHPT (virtual hashed page table) miss, software fills both the TLB and VHPT. Is that what you meant? Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2006-Jan-17 02:47 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
> Not only possible, but normal. If there is a TLB miss and > a VHPT (virtual hashed page table) miss, software fills > both the TLB and VHPT. > > Is that what you meant?Yep, that''s exactly what I thought happened :-) IIRC, you said you don''t bother with the guest VHPT, right? So presumably you reflect TLB misses to the guest and intercept its TLB fill instruction, apply the P2M translation, then add it to *Xen*''s VHPT and fill the TLB correctly? I know I''ve followed some of these discussions before, just a bit rusty now ;-) Cheers, Mark -- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Magenheimer, Dan (HP Labs Fort Collins)
2006-Jan-17 03:03 UTC
RE: [Xen-devel] Essay on an important Xen decision (long)
> > Not only possible, but normal. If there is a TLB miss and > > a VHPT (virtual hashed page table) miss, software fills > > both the TLB and VHPT. > > > > Is that what you meant? > > Yep, that''s exactly what I thought happened :-) > > IIRC, you said you don''t bother with the guest VHPT, right? > So presumably you > reflect TLB misses to the guest and intercept its TLB fill > instruction, apply > the P2M translation, then add it to *Xen*''s VHPT and fill the > TLB correctly? > > I know I''ve followed some of these discussions before, just a > bit rusty now ;-)Exactly... except for one nice shortcut that Matt Chapman added. Since the VHPT is architected and the guest is expecting that it may be walked, when Xen intercepts the initial TLB miss, it can first look in the guest VHPT to resolve the miss (and add it to Xen''s VHPT and fill the TLB) rather than reflect the TLB miss to the guest. Only if the translation isn''t found in the guest VHPT (or if looking for it -- a user_access -- causes another TLB miss), then the TLB miss is reflected to the guest. Thus, guests have the benefit not only of the hardware TLB and Xen''s VHPT but also their own VHPT. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Mark Williamson
2006-Jan-17 03:16 UTC
Re: [Xen-devel] Essay on an important Xen decision (long)
> > I know I''ve followed some of these discussions before, just a > > bit rusty now ;-) > > Exactly... except for one nice shortcut that Matt Chapman > added. Since the VHPT is architected and the guest is > expecting that it may be walked, when Xen intercepts the > initial TLB miss, it can first look in the guest VHPT > to resolve the miss (and add it to Xen''s VHPT and fill > the TLB) rather than reflect the TLB miss to the guest. > Only if the translation isn''t found in the guest VHPT > (or if looking for it -- a user_access -- causes another > TLB miss), then the TLB miss is reflected to the guest. > > Thus, guests have the benefit not only of the hardware TLB > and Xen''s VHPT but also their own VHPT.I wondered if that''d be useful to do. I guess Linux would naturally try to fill the VHPT eagerly as a performance optimisation, so this should work quite nicely - you''d only get the extra cost of reflecting the fault at times when even native Linux would have missed the VHPT. Sweet! And the real VHPT is per (logical) CPU? I guess walking the guest VHPT additionally gives you (effectively) a VHPT per virtual processor, but the cost coming out of domain memory. The fast-path VHPT in Xen doesn''t need to have such a high hit-rate as a result, I assume. Had you evaluated the costs of having the guest explictly update Xen''s VHPT? (or at least hint that an update was necessary for some reason). Cheers, Mark -- Dave: Just a question. What use is a unicyle with no seat? And no pedals! Mark: To answer a question with a question: What use is a skateboard? Dave: Skateboards have wheels. Mark: My wheel has a wheel! _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2006-Jan-17 04:11 UTC
RE: [Xen-devel] Essay on an important Xen decision (long)
>From: Mark Williamson [mailto:mark.williamson@cl.cam.ac.uk] >Sent: 2006年1月17日 11:17 > >I wondered if that''d be useful to do. I guess Linux would naturally try to >fill the VHPT eagerly as a performance optimisation, so this should work >quite nicely - you''d only get the extra cost of reflecting the fault at times >when even native Linux would have missed the VHPT. Sweet! > >And the real VHPT is per (logical) CPU? I guess walking the guest VHPT >additionally gives you (effectively) a VHPT per virtual processor, but the >cost coming out of domain memory. The fast-path VHPT in Xen doesn''t need to >have such a high hit-rate as a result, I assume.You capture the point there. ;-) Currently there''re two solutions co-existing in current xen-ia64: per-LP(logical processor) VHPT/simplified vTLB and per-VP VHPT(virtual processor)/hash vTLB. The former is used by dom0/domU while the latter for domVTI. vTLB is the pool to track guest TLB related insertion/purge operation, and thus behave like shadow to machine TLB. Simplified vTLB means minimal architecture requirements with 8 DTR/ITR and 1 DTC/ITC and thus with less hit rate. Hash vTLB is a hash distributed table with collision support with more memory required but higher hit rate. Currently it''s more urgent to merge two solutions to be general, instead of the strategy. To be per-VP or per-LP is discussed many times before, which is actually not that obvious without a general solution and benchmark data provided. We''ll have a discussion on this topic in tomorrow''s summit.> >Had you evaluated the costs of having the guest explictly update Xen''s VHPT? >(or at least hint that an update was necessary for some reason).To let guest explicitly update Xen''s VHPT has several obvious limitations: - VHPT on IA64 has two format: short and long. To support different OS, Xen has to construct a VHPT table with long format to support both from guest. Currently linux is using short format VHPT, so it means a lot modification to xenlinux operating xen''s long format VHPT directly. - It also conflict with current region id virtualization policy. On xen/ia64, region id describes one address space which is virtualized with fewer bits to xenlinux than ones machine actually supported. If xenlinux directly executes hash algorithm on virtualized region id, it''s actually meaningless. Thanks, Kevin> >Cheers, >Mark > >-- >Dave: Just a question. What use is a unicyle with no seat? And no pedals! >Mark: To answer a question with a question: What use is a skateboard? >Dave: Skateboards have wheels. >Mark: My wheel has a wheel!_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel