Hello, In the current model, only one instance of qemu is running for each running HVM domain. We are looking at disaggregating qemu to have, for example, an instance to emulate only network controllers, another to emulate block devices, etc... Multiple instances of qemu would run for a single Xen domain. Each one would handle a subset of the hardware. Has someone already looked at it and potentially already submitted code for qemu ? The purpose of this e-mail is to start a discussion and gather opinions on how the qemu developers community would like to see it implemented. A couple of questions comes to mind: 1) How hard would it be to untangle "machine" specific (PC hardware) emulation from "device" specific emulation (PCI devices) ? 2) How can we achieve disaggregation from a configuration point of view. Currently, Xen toolstack starts qemu, and tells qemu which device to emulate using the command line. I''ve heard about a project for creating machine description configuration files for QEMU which could help greatly in dividing up which hardware to emulate in which instance of qemu. What is the status of this project ? Thank you for your answers,
On Tue, 2012-02-28 at 06:46 -0500, Julien Grall wrote:> Hello, > > In the current model, only one instance of qemu is running for each running HVM domain. > > We are looking at disaggregating qemu to have, for example, an instance to emulate only > network controllers, another to emulate block devices, etc... > > Multiple instances of qemu would run for a single Xen domain. Each one would handle > a subset of the hardware. > > Has someone already looked at it and potentially already submitted code for qemu ?I''m not aware of any code existing to do this. There''s a bunch of interesting stuff to do on the Xen side to make this stuff work. Firstly you would need to add support to the hypervisor for dispatching I/O requests to multiple qemu instances (via multiple io req rings). I think at the moment there is only support for a single ring (or maybe it''s one sync and one buffered I/O ring). You''d also need to make sure that qemu explicitly requests all the MMIO regions it is interested in. Currently the hypervisor forwards any unknown MMIO to qemu so the explicit registration is probably not done as consistently as it could be. If you want to have N qemus then you need to make sure that at least N-1 of register for everything they are interested in. Currently the PCI config space decode is done within qemu which is a bit tricky if you are wanting to have different emulated PCI devices in different qemu processes. We think it would independently be an architectural improvement to have the hypervisor do the PCI config space decode anyway. This would allow it to forward the I/O to the correct qemu (there are other benefits to this change, e.g. relating to PCI passthrough and the handling of MSI configuration etc) Then you''d need to do a bunch of toolstack level work to start and manage the multiple qemu processes instead of the existing single process. So, a bunch of stuff but I think it is all reasonable to do individually and each brings its own advantages to the architecture outside of this project too.> The purpose of this e-mail is to start a discussion and gather opinions on how the > qemu developers community would like to see it implemented. > > A couple of questions comes to mind:I guess these are mostly qemu side questions. I''m not familiar enough with the internals on that side to answer really.> 1) How hard would it be to untangle "machine" specific (PC hardware) emulation > from "device" specific emulation (PCI devices) ? > > 2) How can we achieve disaggregation from a configuration point of view. Currently, > Xen toolstack starts qemu, and tells qemu which device to emulate using the command > line. I''ve heard about a project for creating machine description configuration files > for QEMU which could help greatly in dividing up which hardware to emulate in which > instance of qemu.I''ve only vaguely heard about this but it certainly seems like functionality which would benefit more than just Xen. Ian.> What is the status of this project ? > > Thank you for your answers, > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > lists.xen.org/xen-devel
On 02/28/2012 05:46 AM, Julien Grall wrote:> Hello, > > In the current model, only one instance of qemu is running for each running HVM > domain. > > We are looking at disaggregating qemu to have, for example, an instance to > emulate only > network controllers, another to emulate block devices, etc...Why would you want to do this? Regards, Anthony Liguori> > Multiple instances of qemu would run for a single Xen domain. Each one would handle > a subset of the hardware. > > Has someone already looked at it and potentially already submitted code for qemu ? > The purpose of this e-mail is to start a discussion and gather opinions on how the > qemu developers community would like to see it implemented. > > A couple of questions comes to mind: > > 1) How hard would it be to untangle "machine" specific (PC hardware) emulation > from "device" specific emulation (PCI devices) ? > > 2) How can we achieve disaggregation from a configuration point of view. Currently, > Xen toolstack starts qemu, and tells qemu which device to emulate using the command > line. I''ve heard about a project for creating machine description configuration > files > for QEMU which could help greatly in dividing up which hardware to emulate in which > instance of qemu. What is the status of this project ? > > Thank you for your answers, > >
On Mon, 5 Mar 2012, Anthony Liguori wrote:> On 02/28/2012 05:46 AM, Julien Grall wrote: > > Hello, > > > > In the current model, only one instance of qemu is running for each running HVM > > domain. > > > > We are looking at disaggregating qemu to have, for example, an instance to > > emulate only > > network controllers, another to emulate block devices, etc... > > Why would you want to do this?We are trying to disaggregate QEMU, the same way we do with Linux. On Xen we can run a Linux guest to drive the network card, another Linux guest to drive the SATA controller, another one for the management stack, etc. This helps both with scalability and isolation. In this scenario is only natural that we run a QEMU that only emulates a SATA controller in the storage domain, a QEMU that only emulates the network card in the network domain and everything else in a stubdom. What''s better than using QEMU as emulator? Using three QEMUs per guest as emulators! :-)
On 03/05/2012 04:53 PM, Stefano Stabellini wrote:> On Mon, 5 Mar 2012, Anthony Liguori wrote: >> On 02/28/2012 05:46 AM, Julien Grall wrote: >>> Hello, >>> >>> In the current model, only one instance of qemu is running for each running HVM >>> domain. >>> >>> We are looking at disaggregating qemu to have, for example, an instance to >>> emulate only >>> network controllers, another to emulate block devices, etc... >> >> Why would you want to do this? > > We are trying to disaggregate QEMU, the same way we do with Linux. > > On Xen we can run a Linux guest to drive the network card, another Linux > guest to drive the SATA controller, another one for the management > stack, etc. This helps both with scalability and isolation. > > In this scenario is only natural that we run a QEMU that only emulates > a SATA controller in the storage domain, a QEMU that only emulates the > network card in the network domain and everything else in a stubdom. > > What''s better than using QEMU as emulator? Using three QEMUs per guest > as emulators! :-)My concern is that this moves the Xen use case pretty far from what the typical QEMU use case would be (running one emulator per guest). If it was done in a non-invasive way, maybe it would be acceptable but at a high level, I don''t see how that''s possible. I almost think you would be better off working to build a second front end (reusing the device model, and nothing else) specifically for Xen. Almost like qemu-io but instead of using the block layer, use the device model. Regards, Anthony Liguori>
Anthony Liguori <anthony@codemonkey.ws> writes:> On 03/05/2012 04:53 PM, Stefano Stabellini wrote: >> On Mon, 5 Mar 2012, Anthony Liguori wrote: >>> On 02/28/2012 05:46 AM, Julien Grall wrote: >>>> Hello, >>>> >>>> In the current model, only one instance of qemu is running for each running HVM >>>> domain. >>>> >>>> We are looking at disaggregating qemu to have, for example, an instance to >>>> emulate only >>>> network controllers, another to emulate block devices, etc... >>> >>> Why would you want to do this? >> >> We are trying to disaggregate QEMU, the same way we do with Linux. >> >> On Xen we can run a Linux guest to drive the network card, another Linux >> guest to drive the SATA controller, another one for the management >> stack, etc. This helps both with scalability and isolation. >> >> In this scenario is only natural that we run a QEMU that only emulates >> a SATA controller in the storage domain, a QEMU that only emulates the >> network card in the network domain and everything else in a stubdom. >> >> What''s better than using QEMU as emulator? Using three QEMUs per guest >> as emulators! :-) > > My concern is that this moves the Xen use case pretty far from what > the typical QEMU use case would be (running one emulator per guest). > > If it was done in a non-invasive way, maybe it would be acceptable but > at a high level, I don''t see how that''s possible. > > I almost think you would be better off working to build a second front > end (reusing the device model, and nothing else) specifically for Xen. > > Almost like qemu-io but instead of using the block layer, use the device model.What''s in it for uses other than Xen? I figure a qemu-dev could help us move towards a more explicit interface between devices and the rest. qemu-io is useful for testing. Do you think a qemu-dev could become a useful testing tool as well?
On 03/06/2012 02:22 AM, Markus Armbruster wrote:> Anthony Liguori<anthony@codemonkey.ws> writes: > >> My concern is that this moves the Xen use case pretty far from what >> the typical QEMU use case would be (running one emulator per guest). >> >> If it was done in a non-invasive way, maybe it would be acceptable but >> at a high level, I don''t see how that''s possible. >> >> I almost think you would be better off working to build a second front >> end (reusing the device model, and nothing else) specifically for Xen. >> >> Almost like qemu-io but instead of using the block layer, use the device model. > > What''s in it for uses other than Xen? > > I figure a qemu-dev could help us move towards a more explicit interface > between devices and the rest. > > qemu-io is useful for testing. Do you think a qemu-dev could become a > useful testing tool as well?It all depends on how it develops. I think that''s the primary advantage of doing a qemu-dev here for Xen. Instead of shoe horning a use-case that really doesn''t fit with the model of qemu-system-*, we can look at a new executable that satisfies another use case and potentially use it for other things. The fundamental characteristic of qemu-system-* is "a guest is a single process". That''s the defining characteristic to me and all of the global soup we have deeply ingrains that into our design. Trying to turn it into "half a guest is a single process" is going to be horrific. OTOH, I can imagine qemu-dev as simply, "this process exposes a set of devices and an RPC interface to interact with them". That will surely improve modularity, could be useful for testing, and possibly will evolve into something that it useful outside of Xen. Regards, Anthony Liguori>
Stefano Stabellini
2012-Mar-12 13:17 UTC
Re: [Qemu-devel] Qemu disaggregation in Xen environment
On Tue, 6 Mar 2012, Anthony Liguori wrote:> On 03/05/2012 04:53 PM, Stefano Stabellini wrote: > > On Mon, 5 Mar 2012, Anthony Liguori wrote: > >> On 02/28/2012 05:46 AM, Julien Grall wrote: > >>> Hello, > >>> > >>> In the current model, only one instance of qemu is running for each running HVM > >>> domain. > >>> > >>> We are looking at disaggregating qemu to have, for example, an instance to > >>> emulate only > >>> network controllers, another to emulate block devices, etc... > >> > >> Why would you want to do this? > > > > We are trying to disaggregate QEMU, the same way we do with Linux. > > > > On Xen we can run a Linux guest to drive the network card, another Linux > > guest to drive the SATA controller, another one for the management > > stack, etc. This helps both with scalability and isolation. > > > > In this scenario is only natural that we run a QEMU that only emulates > > a SATA controller in the storage domain, a QEMU that only emulates the > > network card in the network domain and everything else in a stubdom. > > > > What''s better than using QEMU as emulator? Using three QEMUs per guest > > as emulators! :-) > > My concern is that this moves the Xen use case pretty far from what the typical > QEMU use case would be (running one emulator per guest). > > If it was done in a non-invasive way, maybe it would be acceptable but at a high > level, I don''t see how that''s possible. > > I almost think you would be better off working to build a second front end > (reusing the device model, and nothing else) specifically for Xen. > > Almost like qemu-io but instead of using the block layer, use the device model.(sorry for the reply, I was traveling and sick: bad combination) Ideally what we would like is a way to run a QEMU emulator that only builds a machine with the devices we want, let''s say just a SATA controller. My understanding of "Machine description as data" is that it would perfectly fit this use case, so if we had it in QEMU, we wouldn''t have any need for a separate "qemu-dev". Also, considering the way QEMU hooks into Xen, it is rather simple for us to run multiple QEMUs for a single domain, the changes on the QEMU side to do that would be minimal: we just need to introduce a registration mechanism for QEMU to tell Xen what IO events it is the handler of. Finally it would also nice to have a way to restrict at compile time the amount of supported emulators so that we can have a QEMU binary tailored to the machine it is going to emulate. This last step is probably a bit harder than the others but it makes perfect sense as a follow up of "Machine description as data". Now, if you are opposed to having "Machine description as data" in QEMU, then we can do all this in "qemu-dev", even though I am a bit concerned about code duplication between vl.c and the future qemu-dev.c.
On 03/05/2012 10:06 PM, Ian Campbell wrote:> I''m not aware of any code existing to do this. > > There''s a bunch of interesting stuff to do on the Xen side to make this > stuff work. > > Firstly you would need to add support to the hypervisor for dispatching > I/O requests to multiple qemu instances (via multiple io req rings). I > think at the moment there is only support for a single ring (or maybe > it''s one sync and one buffered I/O ring). >I have modified Xen to create "ioreq servers". An ioreq server contains a list of IO ranges and a list of BDFs to trap IOs for a unique instance of QEMU running. Each ioreq server can be associated to an event channel. This way we can deliver IOs events to different processes. For each QEMU, a ioreq server is created. QEMU must specify which pci (with a BDF) and IO range it handles. I added some hypercalls: - to register an ioreq server - to register/unregister BDF - to register/unregister IO range For the moment all QEMUs share the same pages (buffered and IO request). For more security, I would like to privatize these pages for each ioreq server. I saw these pages are allocated by the toolstack. Can we assume that the toolstack know at domain creation time how many QEMU it is going to spawn ?> You''d also need to make sure that qemu explicitly requests all the MMIO > regions it is interested in. Currently the hypervisor forwards any > unknown MMIO to qemu so the explicit registration is probably not done > as consistently as it could be. If you want to have N qemus then you > need to make sure that at least N-1 of register for everything they are > interested in. >I have modified QEMU to register all IO ranges and pci it needs. All unregister IO is discarded by Xen.> Currently the PCI config space decode is done within qemu which is a bit > tricky if you are wanting to have different emulated PCI devices in > different qemu processes. We think it would independently be an > architectural improvement to have the hypervisor do the PCI config space > decode anyway. This would allow it to forward the I/O to the correct > qemu (there are other benefits to this change, e.g. relating to PCI > passthrough and the handling of MSI configuration etc) >I have created a patch which allow Xen to catch cf8 through cff io port registers. Xen goes through the list of ioreq servers to know which server can handle the PCI and prepare the resquest. For that I added a new io request type IOREQ_TYPE_PCI_CONFIG.> Then you''d need to do a bunch of toolstack level work to start and > manage the multiple qemu processes instead of the existing single > process.I have began to modify the toolstack. For the moment, I just handle a new type of device model for my own test.