thr3ads.net - Xen devel - Qemu disaggregation in Xen environment [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Julien Grall

2012-Feb-28 11:46 UTC

Qemu disaggregation in Xen environment

Hello,

In the current model, only one instance of qemu is running for each running HVM
domain.

We are looking at disaggregating qemu to have, for example, an instance to
emulate only
network controllers, another to emulate block devices, etc...

Multiple instances of qemu would run for a single Xen domain. Each one would
handle
a subset of the hardware.

Has someone already looked at it and potentially already submitted code for qemu
?
The purpose of this e-mail is to start a discussion and gather opinions on how
the
qemu developers community would like to see it implemented.

A couple of questions comes to mind:

1) How hard would it be to untangle "machine" specific (PC hardware)
emulation
from "device" specific emulation (PCI devices) ?

2) How can we achieve disaggregation from a configuration point of view.
Currently,
Xen toolstack starts qemu, and tells qemu which device to emulate using the
command
line. I''ve heard about a project for creating machine description
configuration files
for QEMU which could help greatly in dividing up which hardware to emulate in
which
instance of qemu. What is the status of this project ?

Thank you for your answers,

Ian Campbell

2012-Mar-05 22:06 UTC

head link

Re: [Xen-devel] Qemu disaggregation in Xen environment

On Tue, 2012-02-28 at 06:46 -0500, Julien Grall wrote:> Hello,
> 
> In the current model, only one instance of qemu is running for each running
HVM domain.
> 
> We are looking at disaggregating qemu to have, for example, an instance to
emulate only
> network controllers, another to emulate block devices, etc...
> 
> Multiple instances of qemu would run for a single Xen domain. Each one
would handle
> a subset of the hardware.
> 
> Has someone already looked at it and potentially already submitted code for
qemu ?
I''m not aware of any code existing to do this.

There''s a bunch of interesting stuff to do on the Xen side to make this
stuff work.

Firstly you would need to add support to the hypervisor for dispatching
I/O requests to multiple qemu instances (via multiple io req rings). I
think at the moment there is only support for a single ring (or maybe
it''s one sync and one buffered I/O ring).

You''d also need to make sure that qemu explicitly requests all the MMIO
regions it is interested in. Currently the hypervisor forwards any
unknown MMIO to qemu so the explicit registration is probably not done
as consistently as it could be. If you want to have N qemus then you
need to make sure that at least N-1 of register for everything they are
interested in.

Currently the PCI config space decode is done within qemu which is a bit
tricky if you are wanting to have different emulated PCI devices in
different qemu processes. We think it would independently be an
architectural improvement to have the hypervisor do the PCI config space
decode anyway. This would allow it to forward the I/O to the correct
qemu (there are other benefits to this change, e.g. relating to PCI
passthrough and the handling of MSI configuration etc)

Then you''d need to do a bunch of toolstack level work to start and
manage the multiple qemu processes instead of the existing single
process.

So, a bunch of stuff but I think it is all reasonable to do individually
and each brings its own advantages to the architecture outside of this
project too.
> The purpose of this e-mail is to start a discussion and gather opinions on
how the
> qemu developers community would like to see it implemented.
> 
> A couple of questions comes to mind:
I guess these are mostly qemu side questions. I''m not familiar enough
with the internals on that side to answer really.
> 1) How hard would it be to untangle "machine" specific (PC
hardware) emulation
> from "device" specific emulation (PCI devices) ?
> 
> 2) How can we achieve disaggregation from a configuration point of view.
Currently,
> Xen toolstack starts qemu, and tells qemu which device to emulate using the
command
> line. I''ve heard about a project for creating machine description
configuration files
> for QEMU which could help greatly in dividing up which hardware to emulate
in which
> instance of qemu.
I''ve only vaguely heard about this but it certainly seems like
functionality which would benefit more than just Xen.

Ian.
>  What is the status of this project ?
> 
> Thank you for your answers,
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> lists.xen.org/xen-devel

Anthony Liguori

2012-Mar-05 22:20 UTC

head link

Re: Qemu disaggregation in Xen environment

On 02/28/2012 05:46 AM, Julien Grall wrote:> Hello,
>
> In the current model, only one instance of qemu is running for each running
HVM
> domain.
>
> We are looking at disaggregating qemu to have, for example, an instance to
> emulate only
> network controllers, another to emulate block devices, etc...
Why would you want to do this?

Regards,

Anthony Liguori
>
> Multiple instances of qemu would run for a single Xen domain. Each one
would handle
> a subset of the hardware.
>
> Has someone already looked at it and potentially already submitted code for
qemu ?
> The purpose of this e-mail is to start a discussion and gather opinions on
how the
> qemu developers community would like to see it implemented.
>
> A couple of questions comes to mind:
>
> 1) How hard would it be to untangle "machine" specific (PC
hardware) emulation
> from "device" specific emulation (PCI devices) ?
>
> 2) How can we achieve disaggregation from a configuration point of view.
Currently,
> Xen toolstack starts qemu, and tells qemu which device to emulate using the
command
> line. I''ve heard about a project for creating machine description
configuration
> files
> for QEMU which could help greatly in dividing up which hardware to emulate
in which
> instance of qemu. What is the status of this project ?
>
> Thank you for your answers,
>
>

Stefano Stabellini

2012-Mar-05 22:53 UTC

head link

Re: Qemu disaggregation in Xen environment

On Mon, 5 Mar 2012, Anthony Liguori wrote:> On 02/28/2012 05:46 AM, Julien Grall wrote:
> > Hello,
> >
> > In the current model, only one instance of qemu is running for each
running HVM
> > domain.
> >
> > We are looking at disaggregating qemu to have, for example, an
instance to
> > emulate only
> > network controllers, another to emulate block devices, etc...
> 
> Why would you want to do this?
We are trying to disaggregate QEMU, the same way we do with Linux.

On Xen we can run a Linux guest to drive the network card, another Linux
guest to drive the SATA controller, another one for the management
stack, etc. This helps both with scalability and isolation.

In this scenario is only natural that we run a QEMU that only emulates
a SATA controller in the storage domain, a QEMU that only emulates the
network card in the network domain and everything else in a stubdom.

What''s better than using QEMU as emulator? Using three QEMUs per guest
as emulators! :-)

Anthony Liguori

2012-Mar-06 01:45 UTC

head link

Re: Qemu disaggregation in Xen environment

On 03/05/2012 04:53 PM, Stefano Stabellini wrote:> On Mon, 5 Mar 2012, Anthony Liguori wrote:
>> On 02/28/2012 05:46 AM, Julien Grall wrote:
>>> Hello,
>>>
>>> In the current model, only one instance of qemu is running for each
running HVM
>>> domain.
>>>
>>> We are looking at disaggregating qemu to have, for example, an
instance to
>>> emulate only
>>> network controllers, another to emulate block devices, etc...
>>
>> Why would you want to do this?
>
> We are trying to disaggregate QEMU, the same way we do with Linux.
>
> On Xen we can run a Linux guest to drive the network card, another Linux
> guest to drive the SATA controller, another one for the management
> stack, etc. This helps both with scalability and isolation.
>
> In this scenario is only natural that we run a QEMU that only emulates
> a SATA controller in the storage domain, a QEMU that only emulates the
> network card in the network domain and everything else in a stubdom.
>
> What''s better than using QEMU as emulator? Using three QEMUs per
guest
> as emulators! :-)
My concern is that this moves the Xen use case pretty far from what the typical 
QEMU use case would be (running one emulator per guest).

If it was done in a non-invasive way, maybe it would be acceptable but at a high
level, I don''t see how that''s possible.

I almost think you would be better off working to build a second front end 
(reusing the device model, and nothing else) specifically for Xen.

Almost like qemu-io but instead of using the block layer, use the device model.

Regards,

Anthony Liguori
>

Markus Armbruster

2012-Mar-06 08:22 UTC

head link

Re: Qemu disaggregation in Xen environment

Anthony Liguori <anthony@codemonkey.ws> writes:
> On 03/05/2012 04:53 PM, Stefano Stabellini wrote:
>> On Mon, 5 Mar 2012, Anthony Liguori wrote:
>>> On 02/28/2012 05:46 AM, Julien Grall wrote:
>>>> Hello,
>>>>
>>>> In the current model, only one instance of qemu is running for
each running HVM
>>>> domain.
>>>>
>>>> We are looking at disaggregating qemu to have, for example, an
instance to
>>>> emulate only
>>>> network controllers, another to emulate block devices, etc...
>>>
>>> Why would you want to do this?
>>
>> We are trying to disaggregate QEMU, the same way we do with Linux.
>>
>> On Xen we can run a Linux guest to drive the network card, another
Linux
>> guest to drive the SATA controller, another one for the management
>> stack, etc. This helps both with scalability and isolation.
>>
>> In this scenario is only natural that we run a QEMU that only emulates
>> a SATA controller in the storage domain, a QEMU that only emulates the
>> network card in the network domain and everything else in a stubdom.
>>
>> What''s better than using QEMU as emulator? Using three QEMUs
per guest
>> as emulators! :-)
>
> My concern is that this moves the Xen use case pretty far from what
> the typical QEMU use case would be (running one emulator per guest).
>
> If it was done in a non-invasive way, maybe it would be acceptable but
> at a high level, I don''t see how that''s possible.
>
> I almost think you would be better off working to build a second front
> end (reusing the device model, and nothing else) specifically for Xen.
>
> Almost like qemu-io but instead of using the block layer, use the device
model.
What''s in it for uses other than Xen?

I figure a qemu-dev could help us move towards a more explicit interface
between devices and the rest.

qemu-io is useful for testing.  Do you think a qemu-dev could become a
useful testing tool as well?

Anthony Liguori

2012-Mar-06 15:11 UTC

head link

Re: Qemu disaggregation in Xen environment

On 03/06/2012 02:22 AM, Markus Armbruster wrote:> Anthony Liguori<anthony@codemonkey.ws>  writes:
>
>> My concern is that this moves the Xen use case pretty far from what
>> the typical QEMU use case would be (running one emulator per guest).
>>
>> If it was done in a non-invasive way, maybe it would be acceptable but
>> at a high level, I don''t see how that''s possible.
>>
>> I almost think you would be better off working to build a second front
>> end (reusing the device model, and nothing else) specifically for Xen.
>>
>> Almost like qemu-io but instead of using the block layer, use the
device model.
>
> What''s in it for uses other than Xen?
>
> I figure a qemu-dev could help us move towards a more explicit interface
> between devices and the rest.
>
> qemu-io is useful for testing.  Do you think a qemu-dev could become a
> useful testing tool as well?
It all depends on how it develops.  I think that''s the primary
advantage of
doing a qemu-dev here for Xen.  Instead of shoe horning a use-case that really 
doesn''t fit with the model of qemu-system-*, we can look at a new
executable
that satisfies another use case and potentially use it for other things.

The fundamental characteristic of qemu-system-* is "a guest is a single 
process".  That''s the defining characteristic to me and all of the
global soup
we have deeply ingrains that into our design.  Trying to turn it into "half
a
guest is a single process" is going to be horrific.

OTOH, I can imagine qemu-dev as simply, "this process exposes a set of
devices
and an RPC interface to interact with them".  That will surely improve 
modularity, could be useful for testing, and possibly will evolve into something
that it useful outside of Xen.

Regards,

Anthony Liguori
>

Stefano Stabellini

2012-Mar-12 13:17 UTC

head link

Re: [Qemu-devel] Qemu disaggregation in Xen environment

On Tue, 6 Mar 2012, Anthony Liguori wrote:> On 03/05/2012 04:53 PM, Stefano Stabellini wrote:
> > On Mon, 5 Mar 2012, Anthony Liguori wrote:
> >> On 02/28/2012 05:46 AM, Julien Grall wrote:
> >>> Hello,
> >>>
> >>> In the current model, only one instance of qemu is running for
each running HVM
> >>> domain.
> >>>
> >>> We are looking at disaggregating qemu to have, for example, an
instance to
> >>> emulate only
> >>> network controllers, another to emulate block devices, etc...
> >>
> >> Why would you want to do this?
> >
> > We are trying to disaggregate QEMU, the same way we do with Linux.
> >
> > On Xen we can run a Linux guest to drive the network card, another
Linux
> > guest to drive the SATA controller, another one for the management
> > stack, etc. This helps both with scalability and isolation.
> >
> > In this scenario is only natural that we run a QEMU that only emulates
> > a SATA controller in the storage domain, a QEMU that only emulates the
> > network card in the network domain and everything else in a stubdom.
> >
> > What''s better than using QEMU as emulator? Using three QEMUs
per guest
> > as emulators! :-)
> 
> My concern is that this moves the Xen use case pretty far from what the
typical
> QEMU use case would be (running one emulator per guest).
> 
> If it was done in a non-invasive way, maybe it would be acceptable but at a
high
> level, I don''t see how that''s possible.
>
> I almost think you would be better off working to build a second front end 
> (reusing the device model, and nothing else) specifically for Xen.
> 
> Almost like qemu-io but instead of using the block layer, use the device
model.
(sorry for the reply, I was traveling and sick: bad combination)

Ideally what we would like is a way to run a QEMU emulator that only
builds a machine with the devices we want, let''s say just a SATA
controller. My understanding of "Machine description as data" is that
it
would perfectly fit this use case, so if we had it in QEMU, we
wouldn''t have any need for a separate "qemu-dev".
Also, considering the way QEMU hooks into Xen, it is rather simple for
us to run multiple QEMUs for a single domain, the changes on the QEMU
side to do that would be minimal: we just need to introduce a
registration mechanism for QEMU to tell Xen what IO events it is the
handler of.
Finally it would also nice to have a way to restrict at compile time the
amount of supported emulators so that we can have a QEMU binary tailored
to the machine it is going to emulate.
This last step is probably a bit harder than the others but it makes
perfect sense as a follow up of "Machine description as data".

Now, if you are opposed to having "Machine description as data" in
QEMU,
then we can do all this in "qemu-dev", even though I am a bit
concerned
about code duplication between vl.c and the future qemu-dev.c.

Julien Grall

2012-Mar-12 13:42 UTC

head link

Re: [Xen-devel] Qemu disaggregation in Xen environment

On 03/05/2012 10:06 PM, Ian Campbell wrote:> I''m not aware of any code existing to do this.
>
> There''s a bunch of interesting stuff to do on the Xen side to make
this
> stuff work.
>
> Firstly you would need to add support to the hypervisor for dispatching
> I/O requests to multiple qemu instances (via multiple io req rings). I
> think at the moment there is only support for a single ring (or maybe
> it''s one sync and one buffered I/O ring).
>    I have modified Xen to create "ioreq servers". An ioreq server
contains a list of IO ranges and a list of BDFs to trap IOs for a
unique instance of QEMU running.
Each ioreq server can be associated to an event channel.
This way we can deliver IOs events to different processes.
For each QEMU, a ioreq server is created. QEMU must specify
which pci (with a BDF) and IO range it handles.
I added some hypercalls:
     - to register an ioreq server
     - to register/unregister BDF
     - to register/unregister IO range

For the moment all QEMUs share the same pages (buffered and
IO request). For more security, I would like to privatize these
pages for each ioreq server. I saw these pages are allocated by
the toolstack. Can we assume that the toolstack know at domain
creation time how many QEMU it is going to spawn ?> You''d also need to make sure that qemu explicitly requests all the
MMIO
> regions it is interested in. Currently the hypervisor forwards any
> unknown MMIO to qemu so the explicit registration is probably not done
> as consistently as it could be. If you want to have N qemus then you
> need to make sure that at least N-1 of register for everything they are
> interested in.
>    I have modified QEMU to register all IO ranges and pci it needs.
All unregister IO is discarded by Xen.
> Currently the PCI config space decode is done within qemu which is a bit
> tricky if you are wanting to have different emulated PCI devices in
> different qemu processes. We think it would independently be an
> architectural improvement to have the hypervisor do the PCI config space
> decode anyway. This would allow it to forward the I/O to the correct
> qemu (there are other benefits to this change, e.g. relating to PCI
> passthrough and the handling of MSI configuration etc)
>    I have created a patch which allow Xen to catch cf8 through cff
io port registers. Xen goes through the list of ioreq servers to
know which server can handle the PCI and prepare the resquest.
For that I added a new io request type IOREQ_TYPE_PCI_CONFIG.
> Then you''d need to do a bunch of toolstack level work to start and
> manage the multiple qemu processes instead of the existing single
> process.I have began to modify the toolstack. For the moment, I just
handle a new type of device model for my own test.

Xen devel - Feb 2012 - Qemu disaggregation in Xen environment

Qemu disaggregation in Xen environment

Re: [Xen-devel] Qemu disaggregation in Xen environment

Re: Qemu disaggregation in Xen environment

Re: Qemu disaggregation in Xen environment

Re: Qemu disaggregation in Xen environment

Re: Qemu disaggregation in Xen environment

Re: Qemu disaggregation in Xen environment

Re: [Qemu-devel] Qemu disaggregation in Xen environment

Re: [Xen-devel] Qemu disaggregation in Xen environment