* Executive summary The number of event channels available for dom0 is currently one of the biggest limitations on scaling up the number of VMs which can be created on a single system. There are two alternative implementations we could choose, one of which is ready now, the other of which is potentially technically superior, but will not be ready for the 4.3 release. The core question we need to ask the community: How important is lifting the event channel scalability limit to 4.3? Will waiting until 4.4 cause a limit in the uptake of the Xen platform? * The issue The existing event channel implementation for PV guests is implemented as 2-level bit array. This limits the total number of event channels to word_size ^ 2, which is 1024 for 32-bit guests and 4096 for 64-bit guests. This sounds like a lot, until you consider that in a typical system, each VM needs 4 or more event channels in domain 0. This means that for a 32-bit dom0, there is a theoretical maximum of 256 guests -- and in practice it''s more like 180 or so, because of event channels required for other things. XenServer already has customers using VDI that require more VMs than this. * The dilemma When we began the 4.3 release cycle, this was one of the items we identified as a key feature we needed to get for 4.3. Wei Liu started work on an extension of the existing implmentation, allowing 3 levels of event channels. The draft of this is ready, and just needs the last bit of polishing and bug-chasing before it can be accepted. However, several months ago, David Vrabel came up with an alternate design which in theory was more scalable, based on queues of linked lists (which we have internally been calling "FIFO" for short). David has been working on the implementation since, and has a draft protoype; but it''s in no shape to be included in 4.3. There are some things that are attractive about the second solution, including the flexible assignment of interrupt priorities, ease of scalability, and potentially even the FIFO nature of the interrupt delivery. The question at hand then, is whether to take what we have in the 3-level implementation for 4.3, or wait to see how the FIFO implementation turns out (taking either it or the 3-level implementation in 4.4). * The solution in hand: 3-level event channels The basic idea behind 3-level event channels is to extend the existing 2-level implementation to 3 levels. Going to 3 levels would give us 32k event channels for 32-bit, and 256k for 64-bit. One of the advantages of this method is that since it is similar to the existing method, the general concepts and race conditions are fairly well understood and tested. One of the disadvantages that this method inherits from the 2-level event channels is the lack of priority. In the initial implementation of event channels, priority was handled by event channel order: scans for events always started at 0 and went upwards. However, this was not very scalable, as lower-numbered events could easily completely lock out higher-numbered events; and frequently "lower-numbered" simply meant "created earlier". Event channels were forced into a priority even if one was not wanted. So the implementation was tweaked, so that scans don''t start at 0, but continue where the last event left off. This made it so that earlier events were not prioritized and removed the starvation issue, but at the cost of removing all event priorities. Certain events, like the timer event, are special-cased to be always checked, but this is rather a bit of a hack and not very scalable or flexible. One thing that should be noted is that adding the extra level is envisoned only to be used by guests that need the extended event channel space, such as dom0 and driver domains; domUs will continue to use the 2-level version. * The solution close at hand: FIFO event channels The FIFO solution makes event delivery a matter of adding items to a highly structured linked list. The number of event channels for the interface design has a theoretical maximum of 2^28; the current implementation is limimited at 2^17, which is over 100,000. The number is the same for both 32-bit and 64-bit kernels. One of the key design advantages of the FIFO is the ability to assign an arbitrary priority to any event. There are 16 priorities available; one queue for each priority. Higher-priority queues are handled below lower-priority queues, but events within a queue are handled in FIFO order. Another potential advantage is the FIFO ordering. With the current event channel implementation, one can construct scenarios where even with events of the same priority, clusters of events can lock out others based on where they are or the number of them. FIFO solves this by handling events within the same priority strictly in the order in which they were raised. It''s not clear yet, however, whether this has a measurable impact on performance. One of the potential disadvantages of the FIFO solution is the amount of memory that it requires to be mapped into the Xen address space. The FIFO solution requires an entire word per event channel; a reasonably configured system might have up to 128 Xen-mapped pages per dom0 or domU. On the other hand, this number can be scaled at a fine-grained level, and limited by the toolstack; a typical domU would require only one page mapped in the hypervisor. By comparison, the 3-level solution requires only two bits per event channel. Any domain using the extra level would require exactly 16 pages for 64-bit domains, and 2 pages for 32-bit domains. We would expect this to include dom0 and any driver domains, but that domUs would continue using 2-level event channels (and thus require no extra pages to be mapped). * Considerations There are a number of additional considerations to take into account. The first is that the hypervisor maintainers have made it clear that once 3-level event channels is accepted, FIFO will have a higher bar to clear for acceptance. That is, if we wait for the 4.4 timeframe before choosing one to accept, then FIFO will only need to be marginally preferrable to 3-level to be accepted. However, if we accept the 3-level implimentation for 4.3, then FIFO will need to demonstrate that it is significantly better for 4.3 in order to be accepted. We are not yet aware of any companies that are blocked on this feature. Citrix XenServer clients using Citrix''s VDI solution need to be able to run more than 200 guests; however, because XenServer control both the kernel and hypervisor side, they can introduce temporary, non-backwards or forwards-compatible changes to work around the limitation, and so are not blocked. Oracle and SuSE have not indicated that this a feature they are in dire need of. Most cloud deployments that we know of -- even extremely large ones like Amazon or Rackspace -- use large numbers of relatively inexpensive computers, and so typically do not need to run more than 200 VMs per physical host. Another factor to consider is that we are considering attempting a shorter release cadence for 4.4 -- 6 months or possibly less. That means that the impact of delaying the event channel scalability feature will be reduced. * What we need to know What we''re missing in order to make an informed decision is voices from the community: If we delay the event channel scalability feature until 4.4, how likely is this to be an issue? Are there current users or potential users of Xen who need to be able to scale past 200 VMs on a single host, and who would end up choosing another hypervisor if this feature were delayed? Thank you for your time and input. -George Dunlap, 4.3 Release manager
* Executive summary The number of event channels available for dom0 is currently one of the biggest limitations on scaling up the number of VMs which can be created on a single system. There are two alternative implementations we could choose, one of which is ready now, the other of which is potentially technically superior, but will not be ready for the 4.3 release. The core question we need to ask the community: How important is lifting the event channel scalability limit to 4.3? Will waiting until 4.4 cause a limit in the uptake of the Xen platform? * The issue The existing event channel implementation for PV guests is implemented as 2-level bit array. This limits the total number of event channels to word_size ^ 2, which is 1024 for 32-bit guests and 4096 for 64-bit guests. This sounds like a lot, until you consider that in a typical system, each VM needs 4 or more event channels in domain 0. This means that for a 32-bit dom0, there is a theoretical maximum of 256 guests -- and in practice it''s more like 180 or so, because of event channels required for other things. XenServer already has customers using VDI that require more VMs than this. * The dilemma When we began the 4.3 release cycle, this was one of the items we identified as a key feature we needed to get for 4.3. Wei Liu started work on an extension of the existing implmentation, allowing 3 levels of event channels. The draft of this is ready, and just needs the last bit of polishing and bug-chasing before it can be accepted. However, several months ago, David Vrabel came up with an alternate design which in theory was more scalable, based on queues of linked lists (which we have internally been calling "FIFO" for short). David has been working on the implementation since, and has a draft protoype; but it''s in no shape to be included in 4.3. There are some things that are attractive about the second solution, including the flexible assignment of interrupt priorities, ease of scalability, and potentially even the FIFO nature of the interrupt delivery. The question at hand then, is whether to take what we have in the 3-level implementation for 4.3, or wait to see how the FIFO implementation turns out (taking either it or the 3-level implementation in 4.4). * The solution in hand: 3-level event channels The basic idea behind 3-level event channels is to extend the existing 2-level implementation to 3 levels. Going to 3 levels would give us 32k event channels for 32-bit, and 256k for 64-bit. One of the advantages of this method is that since it is similar to the existing method, the general concepts and race conditions are fairly well understood and tested. One of the disadvantages that this method inherits from the 2-level event channels is the lack of priority. In the initial implementation of event channels, priority was handled by event channel order: scans for events always started at 0 and went upwards. However, this was not very scalable, as lower-numbered events could easily completely lock out higher-numbered events; and frequently "lower-numbered" simply meant "created earlier". Event channels were forced into a priority even if one was not wanted. So the implementation was tweaked, so that scans don''t start at 0, but continue where the last event left off. This made it so that earlier events were not prioritized and removed the starvation issue, but at the cost of removing all event priorities. Certain events, like the timer event, are special-cased to be always checked, but this is rather a bit of a hack and not very scalable or flexible. One thing that should be noted is that adding the extra level is envisoned only to be used by guests that need the extended event channel space, such as dom0 and driver domains; domUs will continue to use the 2-level version. * The solution close at hand: FIFO event channels The FIFO solution makes event delivery a matter of adding items to a highly structured linked list. The number of event channels for the interface design has a theoretical maximum of 2^28; the current implementation is limimited at 2^17, which is over 100,000. The number is the same for both 32-bit and 64-bit kernels. One of the key design advantages of the FIFO is the ability to assign an arbitrary priority to any event. There are 16 priorities available; one queue for each priority. Higher-priority queues are handled below lower-priority queues, but events within a queue are handled in FIFO order. Another potential advantage is the FIFO ordering. With the current event channel implementation, one can construct scenarios where even with events of the same priority, clusters of events can lock out others based on where they are or the number of them. FIFO solves this by handling events within the same priority strictly in the order in which they were raised. It''s not clear yet, however, whether this has a measurable impact on performance. One of the potential disadvantages of the FIFO solution is the amount of memory that it requires to be mapped into the Xen address space. The FIFO solution requires an entire word per event channel; a reasonably configured system might have up to 128 Xen-mapped pages per dom0 or domU. On the other hand, this number can be scaled at a fine-grained level, and limited by the toolstack; a typical domU would require only one page mapped in the hypervisor. By comparison, the 3-level solution requires only two bits per event channel. Any domain using the extra level would require exactly 16 pages for 64-bit domains, and 2 pages for 32-bit domains. We would expect this to include dom0 and any driver domains, but that domUs would continue using 2-level event channels (and thus require no extra pages to be mapped). * Considerations There are a number of additional considerations to take into account. The first is that the hypervisor maintainers have made it clear that once 3-level event channels is accepted, FIFO will have a higher bar to clear for acceptance. That is, if we wait for the 4.4 timeframe before choosing one to accept, then FIFO will only need to be marginally preferrable to 3-level to be accepted. However, if we accept the 3-level implimentation for 4.3, then FIFO will need to demonstrate that it is significantly better for 4.3 in order to be accepted. We are not yet aware of any companies that are blocked on this feature. Citrix XenServer clients using Citrix''s VDI solution need to be able to run more than 200 guests; however, because XenServer control both the kernel and hypervisor side, they can introduce temporary, non-backwards or forwards-compatible changes to work around the limitation, and so are not blocked. Oracle and SuSE have not indicated that this a feature they are in dire need of. Most cloud deployments that we know of -- even extremely large ones like Amazon or Rackspace -- use large numbers of relatively inexpensive computers, and so typically do not need to run more than 200 VMs per physical host. Another factor to consider is that we are considering attempting a shorter release cadence for 4.4 -- 6 months or possibly less. That means that the impact of delaying the event channel scalability feature will be reduced. * What we need to know What we''re missing in order to make an informed decision is voices from the community: If we delay the event channel scalability feature until 4.4, how likely is this to be an issue? Are there current users or potential users of Xen who need to be able to scale past 200 VMs on a single host, and who would end up choosing another hypervisor if this feature were delayed? Thank you for your time and input. -George Dunlap, 4.3 Release manager
Anil Madhavapeddy
2013-Mar-27 19:36 UTC
Re: Request for input: Extended event channel support
On 27 Mar 2013, at 11:23, George Dunlap <dunlapg@umich.edu> wrote:> > The FIFO solution makes event delivery a matter of adding items to a > highly structured linked list. The number of event channels for the > interface design has a theoretical maximum of 2^28; the current > implementation is limimited at 2^17, which is over 100,000. The > number is the same for both 32-bit and 64-bit kernels.Is there any reason for such a low default? If I''m not mistaken, every guest needs at least 2 event channels (console, xenstore) and probably has two more for a net and disk device. With stub-domains in the mix, we could easily imagine running 25,000 VMs with a couple of megabytes of RAM each using Mirage (which can boot very low memory guests without too much trouble). This does run into other problems with CPU scheduling and device scalability, but it would be nice if any proposed event channel upgrade went well above this level rather than scrape it. I personally prefer the 4.3 solution (despite the priority hack for the timers) just because the existing limitation is so very trivial to hit. However, I have no view on the level of technical debt that would incur if it subsequently required switching to the FIFO solution in 4.4 and causing yet another round of upgrades. That''s your problem; I just want the extra domains :-) -anil
Anil Madhavapeddy
2013-Mar-27 19:36 UTC
Re: [Xen-devel] Request for input: Extended event channel support
On 27 Mar 2013, at 11:23, George Dunlap <dunlapg@umich.edu> wrote:> > The FIFO solution makes event delivery a matter of adding items to a > highly structured linked list. The number of event channels for the > interface design has a theoretical maximum of 2^28; the current > implementation is limimited at 2^17, which is over 100,000. The > number is the same for both 32-bit and 64-bit kernels.Is there any reason for such a low default? If I''m not mistaken, every guest needs at least 2 event channels (console, xenstore) and probably has two more for a net and disk device. With stub-domains in the mix, we could easily imagine running 25,000 VMs with a couple of megabytes of RAM each using Mirage (which can boot very low memory guests without too much trouble). This does run into other problems with CPU scheduling and device scalability, but it would be nice if any proposed event channel upgrade went well above this level rather than scrape it. I personally prefer the 4.3 solution (despite the priority hack for the timers) just because the existing limitation is so very trivial to hit. However, I have no view on the level of technical debt that would incur if it subsequently required switching to the FIFO solution in 4.4 and causing yet another round of upgrades. That''s your problem; I just want the extra domains :-) -anil
On 27/03/2013 19:36, Anil Madhavapeddy wrote:> On 27 Mar 2013, at 11:23, George Dunlap <dunlapg@umich.edu> wrote: >> >> The FIFO solution makes event delivery a matter of adding items to a >> highly structured linked list. The number of event channels for the >> interface design has a theoretical maximum of 2^28; the current >> implementation is limimited at 2^17, which is over 100,000. The >> number is the same for both 32-bit and 64-bit kernels. > > Is there any reason for such a low default? If I''m not mistaken, > every guest needs at least 2 event channels (console, xenstore) and > probably has two more for a net and disk device.131,072 seemed high enough to me but I''d forgotten about the Mirage use case. This can be trivially raised to 2^19 (524,288). Beyond that, the implementation becomes slightly more complex as the pointers to the event array pages no longer fit in a single page.> With stub-domains in the mix, we could easily imagine running 25,000 > VMs with a couple of megabytes of RAM each using Mirage (which can > boot very low memory guests without too much trouble).Having said that, with 25,000 VMs it would seem sensible to disaggregate things like the console and xenstore (in addition to the network and block backends). Thus reducing the need for event channels for any single domain. David
David Vrabel
2013-Mar-27 21:53 UTC
Re: [Xen-devel] Request for input: Extended event channel support
On 27/03/2013 19:36, Anil Madhavapeddy wrote:> On 27 Mar 2013, at 11:23, George Dunlap <dunlapg@umich.edu> wrote: >> >> The FIFO solution makes event delivery a matter of adding items to a >> highly structured linked list. The number of event channels for the >> interface design has a theoretical maximum of 2^28; the current >> implementation is limimited at 2^17, which is over 100,000. The >> number is the same for both 32-bit and 64-bit kernels. > > Is there any reason for such a low default? If I''m not mistaken, > every guest needs at least 2 event channels (console, xenstore) and > probably has two more for a net and disk device.131,072 seemed high enough to me but I''d forgotten about the Mirage use case. This can be trivially raised to 2^19 (524,288). Beyond that, the implementation becomes slightly more complex as the pointers to the event array pages no longer fit in a single page.> With stub-domains in the mix, we could easily imagine running 25,000 > VMs with a couple of megabytes of RAM each using Mirage (which can > boot very low memory guests without too much trouble).Having said that, with 25,000 VMs it would seem sensible to disaggregate things like the console and xenstore (in addition to the network and block backends). Thus reducing the need for event channels for any single domain. David
Anil Madhavapeddy
2013-Mar-27 22:28 UTC
Re: Request for input: Extended event channel support
On 27 Mar 2013, at 21:53, David Vrabel <dvrabel@cantab.net> wrote:> On 27/03/2013 19:36, Anil Madhavapeddy wrote: >> On 27 Mar 2013, at 11:23, George Dunlap <dunlapg@umich.edu> wrote: >>> >>> The FIFO solution makes event delivery a matter of adding items to a >>> highly structured linked list. The number of event channels for the >>> interface design has a theoretical maximum of 2^28; the current >>> implementation is limimited at 2^17, which is over 100,000. The >>> number is the same for both 32-bit and 64-bit kernels. >> >> Is there any reason for such a low default? If I''m not mistaken, >> every guest needs at least 2 event channels (console, xenstore) and >> probably has two more for a net and disk device. > > 131,072 seemed high enough to me but I''d forgotten about the Mirage use > case. > > This can be trivially raised to 2^19 (524,288). Beyond that, the > implementation becomes slightly more complex as the pointers to the > event array pages no longer fit in a single page.Makes sense.>> With stub-domains in the mix, we could easily imagine running 25,000 >> VMs with a couple of megabytes of RAM each using Mirage (which can >> boot very low memory guests without too much trouble). > > Having said that, with 25,000 VMs it would seem sensible to disaggregate > things like the console and xenstore (in addition to the network and > block backends). Thus reducing the need for event channels for any > single domain.Yeah indeed; this should be pretty easy to do and let the existing 2^17 be enough for a long time too. We''d need to think a bit about a distributed xenstored to avoid having one hotspot servicing so many VMs. One nice thing about the OCaml xenstored is that it should be possible to make an explicitly distributed implementation of the protocol. The data-structure is already based around immutable trees, so it''s a matter of figuring out where to put the consensus logic (probably around /local/domain/*). -anil
Anil Madhavapeddy
2013-Mar-27 22:28 UTC
Re: [Xen-devel] Request for input: Extended event channel support
On 27 Mar 2013, at 21:53, David Vrabel <dvrabel@cantab.net> wrote:> On 27/03/2013 19:36, Anil Madhavapeddy wrote: >> On 27 Mar 2013, at 11:23, George Dunlap <dunlapg@umich.edu> wrote: >>> >>> The FIFO solution makes event delivery a matter of adding items to a >>> highly structured linked list. The number of event channels for the >>> interface design has a theoretical maximum of 2^28; the current >>> implementation is limimited at 2^17, which is over 100,000. The >>> number is the same for both 32-bit and 64-bit kernels. >> >> Is there any reason for such a low default? If I''m not mistaken, >> every guest needs at least 2 event channels (console, xenstore) and >> probably has two more for a net and disk device. > > 131,072 seemed high enough to me but I''d forgotten about the Mirage use > case. > > This can be trivially raised to 2^19 (524,288). Beyond that, the > implementation becomes slightly more complex as the pointers to the > event array pages no longer fit in a single page.Makes sense.>> With stub-domains in the mix, we could easily imagine running 25,000 >> VMs with a couple of megabytes of RAM each using Mirage (which can >> boot very low memory guests without too much trouble). > > Having said that, with 25,000 VMs it would seem sensible to disaggregate > things like the console and xenstore (in addition to the network and > block backends). Thus reducing the need for event channels for any > single domain.Yeah indeed; this should be pretty easy to do and let the existing 2^17 be enough for a long time too. We''d need to think a bit about a distributed xenstored to avoid having one hotspot servicing so many VMs. One nice thing about the OCaml xenstored is that it should be possible to make an explicitly distributed implementation of the protocol. The data-structure is already based around immutable trees, so it''s a matter of figuring out where to put the consensus logic (probably around /local/domain/*). -anil
On Wed, Mar 27, 2013 at 9:53 PM, David Vrabel <dvrabel@cantab.net> wrote:> On 27/03/2013 19:36, Anil Madhavapeddy wrote: >> On 27 Mar 2013, at 11:23, George Dunlap <dunlapg@umich.edu> wrote: >>> >>> The FIFO solution makes event delivery a matter of adding items to a >>> highly structured linked list. The number of event channels for the >>> interface design has a theoretical maximum of 2^28; the current >>> implementation is limimited at 2^17, which is over 100,000. The >>> number is the same for both 32-bit and 64-bit kernels. >> >> Is there any reason for such a low default? If I''m not mistaken, >> every guest needs at least 2 event channels (console, xenstore) and >> probably has two more for a net and disk device. > > 131,072 seemed high enough to me but I''d forgotten about the Mirage use > case. > > This can be trivially raised to 2^19 (524,288). Beyond that, the > implementation becomes slightly more complex as the pointers to the > event array pages no longer fit in a single page. >Then that would require 512 pages mapped in Xen in the worst case, plus>> With stub-domains in the mix, we could easily imagine running 25,000 >> VMs with a couple of megabytes of RAM each using Mirage (which can >> boot very low memory guests without too much trouble). >25,000 pages for domUs if domU uses this ABI as well. This might require bumping global mapping space in Xen, or we can restrict domU to only use default 2-level ABI to solve this problem. But let''s not worry about future things for now. Wei.> Having said that, with 25,000 VMs it would seem sensible to disaggregate > things like the console and xenstore (in addition to the network and > block backends). Thus reducing the need for event channels for any > single domain. > > David >
Wei Liu
2013-Mar-27 22:31 UTC
Re: [Xen-devel] Request for input: Extended event channel support
On Wed, Mar 27, 2013 at 9:53 PM, David Vrabel <dvrabel@cantab.net> wrote:> On 27/03/2013 19:36, Anil Madhavapeddy wrote: >> On 27 Mar 2013, at 11:23, George Dunlap <dunlapg@umich.edu> wrote: >>> >>> The FIFO solution makes event delivery a matter of adding items to a >>> highly structured linked list. The number of event channels for the >>> interface design has a theoretical maximum of 2^28; the current >>> implementation is limimited at 2^17, which is over 100,000. The >>> number is the same for both 32-bit and 64-bit kernels. >> >> Is there any reason for such a low default? If I''m not mistaken, >> every guest needs at least 2 event channels (console, xenstore) and >> probably has two more for a net and disk device. > > 131,072 seemed high enough to me but I''d forgotten about the Mirage use > case. > > This can be trivially raised to 2^19 (524,288). Beyond that, the > implementation becomes slightly more complex as the pointers to the > event array pages no longer fit in a single page. >Then that would require 512 pages mapped in Xen in the worst case, plus>> With stub-domains in the mix, we could easily imagine running 25,000 >> VMs with a couple of megabytes of RAM each using Mirage (which can >> boot very low memory guests without too much trouble). >25,000 pages for domUs if domU uses this ABI as well. This might require bumping global mapping space in Xen, or we can restrict domU to only use default 2-level ABI to solve this problem. But let''s not worry about future things for now. Wei.> Having said that, with 25,000 VMs it would seem sensible to disaggregate > things like the console and xenstore (in addition to the network and > block backends). Thus reducing the need for event channels for any > single domain. > > David >
Konrad Rzeszutek Wilk
2013-Mar-28 01:56 UTC
Re: Request for input: Extended event channel support
On Wed, Mar 27, 2013 at 11:23:23AM +0000, George Dunlap wrote:> * Executive summary > > The number of event channels available for dom0 is currently one of > the biggest limitations on scaling up the number of VMs which can be > created on a single system. There are two alternative implementations > we could choose, one of which is ready now, the other of which is > potentially technically superior, but will not be ready for the 4.3 > release. > > The core question we need to ask the community: How important is > lifting the event channel scalability limit to 4.3? Will waiting > until 4.4 cause a limit in the uptake of the Xen platform? > > * The issue > > The existing event channel implementation for PV guests is implemented > as 2-level bit array. This limits the total number of event channels > to word_size ^ 2, which is 1024 for 32-bit guests and 4096 for 64-bit > guests. > > This sounds like a lot, until you consider that in a typical system, > each VM needs 4 or more event channels in domain 0. This means that > for a 32-bit dom0, there is a theoretical maximum of 256 guests -- and > in practice it''s more like 180 or so, because of event channels > required for other things. XenServer already has customers using VDI > that require more VMs than this. > > * The dilemma > > When we began the 4.3 release cycle, this was one of the items we > identified as a key feature we needed to get for 4.3. Wei Liu started > work on an extension of the existing implmentation, allowing 3 levels > of event channels. The draft of this is ready, and just needs the > last bit of polishing and bug-chasing before it can be accepted. > > However, several months ago, David Vrabel came up with an alternate > design which in theory was more scalable, based on queues of linked > lists (which we have internally been calling "FIFO" for short). David > has been working on the implementation since, and has a draft > protoype; but it''s in no shape to be included in 4.3. > > There are some things that are attractive about the second solution, > including the flexible assignment of interrupt priorities, ease of > scalability, and potentially even the FIFO nature of the interrupt > delivery. > > The question at hand then, is whether to take what we have in the > 3-level implementation for 4.3, or wait to see how the FIFO > implementation turns out (taking either it or the 3-level > implementation in 4.4). > > * The solution in hand: 3-level event channels > > The basic idea behind 3-level event channels is to extend the existing > 2-level implementation to 3 levels. Going to 3 levels would give us > 32k event channels for 32-bit, and 256k for 64-bit. > > One of the advantages of this method is that since it is similar to > the existing method, the general concepts and race conditions are > fairly well understood and tested. > > One of the disadvantages that this method inherits from the 2-level > event channels is the lack of priority. In the initial implementation > of event channels, priority was handled by event channel order: scans > for events always started at 0 and went upwards. However, this was > not very scalable, as lower-numbered events could easily completely > lock out higher-numbered events; and frequently "lower-numbered" > simply meant "created earlier". Event channels were forced into a > priority even if one was not wanted. > > So the implementation was tweaked, so that scans don''t start at 0, but > continue where the last event left off. This made it so that earlier > events were not prioritized and removed the starvation issue, but at > the cost of removing all event priorities. Certain events, like the > timer event, are special-cased to be always checked, but this is > rather a bit of a hack and not very scalable or flexible.Hm, I actually think that is not in the upstream kernel at all. That would explain why on very heavily busy guest the hrtimer: interrupt took XXxXXXXxx ns is printed. Is this patch somewhere available?
Konrad Rzeszutek Wilk
2013-Mar-28 01:56 UTC
Re: [Xen-devel] Request for input: Extended event channel support
On Wed, Mar 27, 2013 at 11:23:23AM +0000, George Dunlap wrote:> * Executive summary > > The number of event channels available for dom0 is currently one of > the biggest limitations on scaling up the number of VMs which can be > created on a single system. There are two alternative implementations > we could choose, one of which is ready now, the other of which is > potentially technically superior, but will not be ready for the 4.3 > release. > > The core question we need to ask the community: How important is > lifting the event channel scalability limit to 4.3? Will waiting > until 4.4 cause a limit in the uptake of the Xen platform? > > * The issue > > The existing event channel implementation for PV guests is implemented > as 2-level bit array. This limits the total number of event channels > to word_size ^ 2, which is 1024 for 32-bit guests and 4096 for 64-bit > guests. > > This sounds like a lot, until you consider that in a typical system, > each VM needs 4 or more event channels in domain 0. This means that > for a 32-bit dom0, there is a theoretical maximum of 256 guests -- and > in practice it''s more like 180 or so, because of event channels > required for other things. XenServer already has customers using VDI > that require more VMs than this. > > * The dilemma > > When we began the 4.3 release cycle, this was one of the items we > identified as a key feature we needed to get for 4.3. Wei Liu started > work on an extension of the existing implmentation, allowing 3 levels > of event channels. The draft of this is ready, and just needs the > last bit of polishing and bug-chasing before it can be accepted. > > However, several months ago, David Vrabel came up with an alternate > design which in theory was more scalable, based on queues of linked > lists (which we have internally been calling "FIFO" for short). David > has been working on the implementation since, and has a draft > protoype; but it''s in no shape to be included in 4.3. > > There are some things that are attractive about the second solution, > including the flexible assignment of interrupt priorities, ease of > scalability, and potentially even the FIFO nature of the interrupt > delivery. > > The question at hand then, is whether to take what we have in the > 3-level implementation for 4.3, or wait to see how the FIFO > implementation turns out (taking either it or the 3-level > implementation in 4.4). > > * The solution in hand: 3-level event channels > > The basic idea behind 3-level event channels is to extend the existing > 2-level implementation to 3 levels. Going to 3 levels would give us > 32k event channels for 32-bit, and 256k for 64-bit. > > One of the advantages of this method is that since it is similar to > the existing method, the general concepts and race conditions are > fairly well understood and tested. > > One of the disadvantages that this method inherits from the 2-level > event channels is the lack of priority. In the initial implementation > of event channels, priority was handled by event channel order: scans > for events always started at 0 and went upwards. However, this was > not very scalable, as lower-numbered events could easily completely > lock out higher-numbered events; and frequently "lower-numbered" > simply meant "created earlier". Event channels were forced into a > priority even if one was not wanted. > > So the implementation was tweaked, so that scans don''t start at 0, but > continue where the last event left off. This made it so that earlier > events were not prioritized and removed the starvation issue, but at > the cost of removing all event priorities. Certain events, like the > timer event, are special-cased to be always checked, but this is > rather a bit of a hack and not very scalable or flexible.Hm, I actually think that is not in the upstream kernel at all. That would explain why on very heavily busy guest the hrtimer: interrupt took XXxXXXXxx ns is printed. Is this patch somewhere available?
On Thu, Mar 28, 2013 at 1:56 AM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:>> So the implementation was tweaked, so that scans don''t start at 0, but >> continue where the last event left off. This made it so that earlier >> events were not prioritized and removed the starvation issue, but at >> the cost of removing all event priorities. Certain events, like the >> timer event, are special-cased to be always checked, but this is >> rather a bit of a hack and not very scalable or flexible. > > Hm, I actually think that is not in the upstream kernel at all. That > would explain why on very heavily busy guest the hrtimer: interrupt > took XXxXXXXxx ns is printed. > > Is this patch somewhere available?I think it was David who told me this -- maybe there is such a hack on the "classic xen" kernel we''re using in XenServer? -George
>>> On 28.03.13 at 12:10, George Dunlap <George.Dunlap@eu.citrix.com> wrote: > On Thu, Mar 28, 2013 at 1:56 AM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: >>> So the implementation was tweaked, so that scans don''t start at 0, but >>> continue where the last event left off. This made it so that earlier >>> events were not prioritized and removed the starvation issue, but at >>> the cost of removing all event priorities. Certain events, like the >>> timer event, are special-cased to be always checked, but this is >>> rather a bit of a hack and not very scalable or flexible. >> >> Hm, I actually think that is not in the upstream kernel at all. That >> would explain why on very heavily busy guest the hrtimer: interrupt >> took XXxXXXXxx ns is printed. >> >> Is this patch somewhere available? > > I think it was David who told me this -- maybe there is such a hack on > the "classic xen" kernel we''re using in XenServer?Indeed - see 1038:a66a7c64b1d0 on linux-2.6.18-xen.hg. Jan
Felipe Franciosi
2013-Mar-28 12:51 UTC
Re: Request for input: Extended event channel support
-----Original Message----- From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Anil Madhavapeddy Sent: 27 March 2013 19:37 To: George Dunlap Cc: xen-users@lists.xen.org; cl-mirage@lists.cam.ac.uk List; xen-devel@lists.xen.org Subject: Re: [Xen-devel] Request for input: Extended event channel support> If I''m not mistaken, every guest needs at least 2 event channels (console, xenstore) and probably has two more for a net and disk device.Presumably for vCPUs as well IINM? Felipe
Felipe Franciosi
2013-Mar-28 12:51 UTC
Re: [Xen-devel] Request for input: Extended event channel support
-----Original Message----- From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Anil Madhavapeddy Sent: 27 March 2013 19:37 To: George Dunlap Cc: xen-users@lists.xen.org; cl-mirage@lists.cam.ac.uk List; xen-devel@lists.xen.org Subject: Re: [Xen-devel] Request for input: Extended event channel support> If I''m not mistaken, every guest needs at least 2 event channels (console, xenstore) and probably has two more for a net and disk device.Presumably for vCPUs as well IINM? Felipe
Anil Madhavapeddy
2013-Mar-28 12:54 UTC
Re: Request for input: Extended event channel support
On 28 Mar 2013, at 12:51, Felipe Franciosi <felipe.franciosi@citrix.com> wrote:> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Anil Madhavapeddy > Sent: 27 March 2013 19:37 > To: George Dunlap > Cc: xen-users@lists.xen.org; cl-mirage@lists.cam.ac.uk List; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] Request for input: Extended event channel support > > >> If I''m not mistaken, every guest needs at least 2 event channels (console, xenstore) and probably has two more for a net and disk device. > > Presumably for vCPUs as well IINM?Yes, except that in Mirage''s case we''re single vCPU only, and use multiple VMs to act as parallel processes with explicit message passing. But we would still need an event channel for the vchan shared ring, in this case too... -anil
Anil Madhavapeddy
2013-Mar-28 12:54 UTC
Re: [Xen-devel] Request for input: Extended event channel support
On 28 Mar 2013, at 12:51, Felipe Franciosi <felipe.franciosi@citrix.com> wrote:> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Anil Madhavapeddy > Sent: 27 March 2013 19:37 > To: George Dunlap > Cc: xen-users@lists.xen.org; cl-mirage@lists.cam.ac.uk List; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] Request for input: Extended event channel support > > >> If I''m not mistaken, every guest needs at least 2 event channels (console, xenstore) and probably has two more for a net and disk device. > > Presumably for vCPUs as well IINM?Yes, except that in Mirage''s case we''re single vCPU only, and use multiple VMs to act as parallel processes with explicit message passing. But we would still need an event channel for the vchan shared ring, in this case too... -anil
Felipe Franciosi
2013-Mar-28 13:02 UTC
Re: [Xen-devel] Request for input: Extended event channel support
-----Original Message----- From: Anil Madhavapeddy [mailto:anil@recoil.org] Sent: 28 March 2013 12:54 To: Felipe Franciosi Cc: xen-users@lists.xen.org; George Dunlap; cl-mirage@lists.cam.ac.uk List; xen-devel@lists.xen.org Subject: Re: [Xen-devel] Request for input: Extended event channel support>> Presumably for vCPUs as well IINM?>Yes, except that in Mirage''s case we''re single vCPU only, and use multiple VMs to act as parallel processes with explicit message passing.There''s also the buffered IO event channel, but I''m pretty sure this is only for HVM so shouldn''t affect the Mirage use case. Just mentioning in case there is someone out there reading this and working out numbers for HVM guests. :) http://lists.xen.org/archives/html/xen-changelog/2011-11/msg00139.html Cheers, Felipe
Felipe Franciosi
2013-Mar-28 13:02 UTC
Re: Request for input: Extended event channel support
-----Original Message----- From: Anil Madhavapeddy [mailto:anil@recoil.org] Sent: 28 March 2013 12:54 To: Felipe Franciosi Cc: xen-users@lists.xen.org; George Dunlap; cl-mirage@lists.cam.ac.uk List; xen-devel@lists.xen.org Subject: Re: [Xen-devel] Request for input: Extended event channel support>> Presumably for vCPUs as well IINM?>Yes, except that in Mirage''s case we''re single vCPU only, and use multiple VMs to act as parallel processes with explicit message passing.There''s also the buffered IO event channel, but I''m pretty sure this is only for HVM so shouldn''t affect the Mirage use case. Just mentioning in case there is someone out there reading this and working out numbers for HVM guests. :) http://lists.xen.org/archives/html/xen-changelog/2011-11/msg00139.html Cheers, Felipe
Konrad Rzeszutek Wilk
2013-Mar-29 13:05 UTC
Re: [Xen-devel] Request for input: Extended event channel support
> * What we need to know > > What we''re missing in order to make an informed decision is voices > from the community: If we delay the event channel scalability feature > until 4.4, how likely is this to be an issue? Are there current users > or potential users of Xen who need to be able to scale past 200 VMs on > a single host, and who would end up choosing another hypervisor if > this feature were delayed?For this to work you also need the Linux side patches. That means that if you want to hit this in v3.10 merge window you have until April 15th to get it in. The reason is that I am out from April 20th, and the merge window will probably be open on May 1st. We need at least one week to work out any bugs when it goes in #linux-next - hence the April 15th deadline. Technically sounding, the FIFO looks more appealing than the three level events, but that is a subjective opinion. The reality is that what should be really determined is which one will give better performance. From a design perspective it looks as FIFO is the clear winner, but perhaps not - I only briefly looked over the paper? Anyhow, I am leaning towards the FIFO - but I think that if there are existing people who want this functionality _Right now_, then the 3-level event channels would offer a stop-gate option. And they can apply it to their hypervisor + Linux by hand right?> > Thank you for your time and input. > > -George Dunlap, > 4.3 Release manager > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
Konrad Rzeszutek Wilk
2013-Mar-29 13:05 UTC
Re: Request for input: Extended event channel support
> * What we need to know > > What we''re missing in order to make an informed decision is voices > from the community: If we delay the event channel scalability feature > until 4.4, how likely is this to be an issue? Are there current users > or potential users of Xen who need to be able to scale past 200 VMs on > a single host, and who would end up choosing another hypervisor if > this feature were delayed?For this to work you also need the Linux side patches. That means that if you want to hit this in v3.10 merge window you have until April 15th to get it in. The reason is that I am out from April 20th, and the merge window will probably be open on May 1st. We need at least one week to work out any bugs when it goes in #linux-next - hence the April 15th deadline. Technically sounding, the FIFO looks more appealing than the three level events, but that is a subjective opinion. The reality is that what should be really determined is which one will give better performance. From a design perspective it looks as FIFO is the clear winner, but perhaps not - I only briefly looked over the paper? Anyhow, I am leaning towards the FIFO - but I think that if there are existing people who want this functionality _Right now_, then the 3-level event channels would offer a stop-gate option. And they can apply it to their hypervisor + Linux by hand right?> > Thank you for your time and input. > > -George Dunlap, > 4.3 Release manager > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >
>>> On 29.03.13 at 14:05, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: >> * What we need to know >> >> What we''re missing in order to make an informed decision is voices >> from the community: If we delay the event channel scalability feature >> until 4.4, how likely is this to be an issue? Are there current users >> or potential users of Xen who need to be able to scale past 200 VMs on >> a single host, and who would end up choosing another hypervisor if >> this feature were delayed? > > For this to work you also need the Linux side patches. That means > that if you want to hit this in v3.10 merge window you have until > April 15th to get it in. The reason is that I am out from > April 20th, and the merge window will probably be open on May 1st.I don''t think upstream inclusion of the Linux side patches is a requirement here. The patches need to exist (or else the code can''t be tested), but there''s no need for them to be in 3.10 as far as the interface selection if concerned. Jan
Jan Beulich
2013-Apr-02 07:44 UTC
Re: [Xen-devel] Request for input: Extended event channel support
>>> On 29.03.13 at 14:05, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: >> * What we need to know >> >> What we''re missing in order to make an informed decision is voices >> from the community: If we delay the event channel scalability feature >> until 4.4, how likely is this to be an issue? Are there current users >> or potential users of Xen who need to be able to scale past 200 VMs on >> a single host, and who would end up choosing another hypervisor if >> this feature were delayed? > > For this to work you also need the Linux side patches. That means > that if you want to hit this in v3.10 merge window you have until > April 15th to get it in. The reason is that I am out from > April 20th, and the merge window will probably be open on May 1st.I don''t think upstream inclusion of the Linux side patches is a requirement here. The patches need to exist (or else the code can''t be tested), but there''s no need for them to be in 3.10 as far as the interface selection if concerned. Jan
Konrad Rzeszutek Wilk
2013-Apr-02 14:20 UTC
Re: Request for input: Extended event channel support
On Tue, Apr 02, 2013 at 08:44:47AM +0100, Jan Beulich wrote:> >>> On 29.03.13 at 14:05, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > >> * What we need to know > >> > >> What we''re missing in order to make an informed decision is voices > >> from the community: If we delay the event channel scalability feature > >> until 4.4, how likely is this to be an issue? Are there current users > >> or potential users of Xen who need to be able to scale past 200 VMs on > >> a single host, and who would end up choosing another hypervisor if > >> this feature were delayed? > > > > For this to work you also need the Linux side patches. That means > > that if you want to hit this in v3.10 merge window you have until > > April 15th to get it in. The reason is that I am out from > > April 20th, and the merge window will probably be open on May 1st. > > I don''t think upstream inclusion of the Linux side patches is a > requirement here. The patches need to exist (or else the code > can''t be tested), but there''s no need for them to be in 3.10 as > far as the interface selection if concerned. >OK.
Konrad Rzeszutek Wilk
2013-Apr-02 14:20 UTC
Re: [Xen-devel] Request for input: Extended event channel support
On Tue, Apr 02, 2013 at 08:44:47AM +0100, Jan Beulich wrote:> >>> On 29.03.13 at 14:05, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > >> * What we need to know > >> > >> What we''re missing in order to make an informed decision is voices > >> from the community: If we delay the event channel scalability feature > >> until 4.4, how likely is this to be an issue? Are there current users > >> or potential users of Xen who need to be able to scale past 200 VMs on > >> a single host, and who would end up choosing another hypervisor if > >> this feature were delayed? > > > > For this to work you also need the Linux side patches. That means > > that if you want to hit this in v3.10 merge window you have until > > April 15th to get it in. The reason is that I am out from > > April 20th, and the merge window will probably be open on May 1st. > > I don''t think upstream inclusion of the Linux side patches is a > requirement here. The patches need to exist (or else the code > can''t be tested), but there''s no need for them to be in 3.10 as > far as the interface selection if concerned. >OK.
On Wed, Mar 27, 2013 at 11:23 AM, George Dunlap <dunlapg@umich.edu> wrote:> * Executive summary > > The number of event channels available for dom0 is currently one of > the biggest limitations on scaling up the number of VMs which can be > created on a single system. There are two alternative implementations > we could choose, one of which is ready now, the other of which is > potentially technically superior, but will not be ready for the 4.3 > release. > > The core question we need to ask the community: How important is > lifting the event channel scalability limit to 4.3? Will waiting > until 4.4 cause a limit in the uptake of the Xen platform?So far the only one who has indicated a preference either way is Anil, who is impatient to be rid of the limit on the number of tiny Mirage VMs he can create. :-) I think overall then I''m leaning towards recommending that we put the decision off until 4.4. -George
On Wed, Mar 27, 2013 at 11:23 AM, George Dunlap <dunlapg@umich.edu> wrote:> * Executive summary > > The number of event channels available for dom0 is currently one of > the biggest limitations on scaling up the number of VMs which can be > created on a single system. There are two alternative implementations > we could choose, one of which is ready now, the other of which is > potentially technically superior, but will not be ready for the 4.3 > release. > > The core question we need to ask the community: How important is > lifting the event channel scalability limit to 4.3? Will waiting > until 4.4 cause a limit in the uptake of the Xen platform?So far the only one who has indicated a preference either way is Anil, who is impatient to be rid of the limit on the number of tiny Mirage VMs he can create. :-) I think overall then I''m leaning towards recommending that we put the decision off until 4.4. -George
On Thu, 2013-03-28 at 12:51 +0000, Felipe Franciosi wrote:> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Anil Madhavapeddy > Sent: 27 March 2013 19:37 > To: George Dunlap > Cc: xen-users@lists.xen.org; cl-mirage@lists.cam.ac.uk List; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] Request for input: Extended event channel support > > > > If I''m not mistaken, every guest needs at least 2 event channels (console, xenstore) and probably has two more for a net and disk device. > > Presumably for vCPUs as well IINM?Aren''t those (the vcpu IPI event channels, timers etc) internal to the guest though? The limit we want to count here are eventchannels with an end point inside dom0. Ian.
Ian Campbell
2013-Apr-10 10:45 UTC
Re: [Xen-devel] Request for input: Extended event channel support
On Thu, 2013-03-28 at 12:51 +0000, Felipe Franciosi wrote:> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Anil Madhavapeddy > Sent: 27 March 2013 19:37 > To: George Dunlap > Cc: xen-users@lists.xen.org; cl-mirage@lists.cam.ac.uk List; xen-devel@lists.xen.org > Subject: Re: [Xen-devel] Request for input: Extended event channel support > > > > If I''m not mistaken, every guest needs at least 2 event channels (console, xenstore) and probably has two more for a net and disk device. > > Presumably for vCPUs as well IINM?Aren''t those (the vcpu IPI event channels, timers etc) internal to the guest though? The limit we want to count here are eventchannels with an end point inside dom0. Ian.
Ian Campbell
2013-Apr-10 10:45 UTC
Re: [Xen-devel] Request for input: Extended event channel support
On Thu, 2013-03-28 at 12:54 +0000, Anil Madhavapeddy wrote:> Yes, except that in Mirage''s case we''re single vCPU only, and use > multiple VMs to act as parallel processes with explicit message > passing. > > But we would still need an event channel for the vchan shared ring, in > this case too...That would be between two mirage guests though, unless you are envisaging mirage "processes" with hundreds of thousands of "threads"? Ian.
Ian Campbell
2013-Apr-10 10:45 UTC
Re: [Xen-users] Request for input: Extended event channel support
On Thu, 2013-03-28 at 12:54 +0000, Anil Madhavapeddy wrote:> Yes, except that in Mirage''s case we''re single vCPU only, and use > multiple VMs to act as parallel processes with explicit message > passing. > > But we would still need an event channel for the vchan shared ring, in > this case too...That would be between two mirage guests though, unless you are envisaging mirage "processes" with hundreds of thousands of "threads"? Ian.
Ian Campbell
2013-Apr-10 10:49 UTC
Re: [Xen-users] Request for input: Extended event channel support
On Thu, 2013-04-04 at 14:31 +0100, George Dunlap wrote:> I think overall then I''m leaning towards recommending that we put the > decision off until 4.4.FWIW that''s the way I''m leaning too. In the absence of lots of loud clamouring it seems there is no need to rush so the default should be to defer. Ian.
On Thu, 2013-04-04 at 14:31 +0100, George Dunlap wrote:> I think overall then I''m leaning towards recommending that we put the > decision off until 4.4.FWIW that''s the way I''m leaning too. In the absence of lots of loud clamouring it seems there is no need to rush so the default should be to defer. Ian.
Anil Madhavapeddy
2013-Apr-10 16:14 UTC
Re: [Xen-devel] Request for input: Extended event channel support
On 10 Apr 2013, at 03:45, Ian Campbell <Ian.Campbell@citrix.com> wrote:> On Thu, 2013-03-28 at 12:54 +0000, Anil Madhavapeddy wrote: >> Yes, except that in Mirage''s case we''re single vCPU only, and use >> multiple VMs to act as parallel processes with explicit message >> passing. >> >> But we would still need an event channel for the vchan shared ring, in >> this case too... > > That would be between two mirage guests though, unless you are > envisaging mirage "processes" with hundreds of thousands of "threads"?That''s correct: most of channels should be directly between guests and not to dom0. It''s convenient to be able to do this via dom0 for some services such as xenstore/xenconsoled, but we could work around this without too much difficulty. George: my use case certainly isn''t a blocker for 4.3. We can maintain local patches for this specialised use case. -anil
Anil Madhavapeddy
2013-Apr-10 16:14 UTC
Re: [Xen-users] Request for input: Extended event channel support
On 10 Apr 2013, at 03:45, Ian Campbell <Ian.Campbell@citrix.com> wrote:> On Thu, 2013-03-28 at 12:54 +0000, Anil Madhavapeddy wrote: >> Yes, except that in Mirage''s case we''re single vCPU only, and use >> multiple VMs to act as parallel processes with explicit message >> passing. >> >> But we would still need an event channel for the vchan shared ring, in >> this case too... > > That would be between two mirage guests though, unless you are > envisaging mirage "processes" with hundreds of thousands of "threads"?That''s correct: most of channels should be directly between guests and not to dom0. It''s convenient to be able to do this via dom0 for some services such as xenstore/xenconsoled, but we could work around this without too much difficulty. George: my use case certainly isn''t a blocker for 4.3. We can maintain local patches for this specialised use case. -anil