thr3ads.net - Xen devel - [Xen-devel] Re-design the architecture of Xen [May 2011]

If this information is useful, please help other people find it:
Share via:

henanwxr

2011-May-23 11:39 UTC

[Xen-devel] Re-design the architecture of Xen

http://xen.1045712.n5.nabble.com/file/n4418793/6.bmp We have researched
virtualization for several years, with the reference of Xen, we have design
a new VMM architecture called Cooperative model VMM，and have implemented a
prototype system.
We present its principle and part of details here.

Part1 motivation

B. Domain0 problems
Domain0 has several features:
 Running modified operating system.
 Running on processor with privilege level 1
 Running in a form of virtual machine
 Single system managing hardware
These features of Domain0 bring the following issues:
1) tight coupling>From a performance point of view, the coordination of Domain0 and VMM (suchas: hypercall), event channel and IO ring can improve virtualization
efficiency, which, however, requires more modification of guest operating
system. Also, VMM needs to provide the corresponding interface. The tight
coupling formed between Domain0 and VMM results that VMM implementations
must take third-party system characteristics into account, design is lack of
independence and flexibility.
2) privilege level switch
Domain0 is running on the processor with privilege level 1, context switch
from the VMM to Domain0 will trigger processor privilege level switches. If
operation of this type is more frequent (such as IO request operation for a
virtual machine), it will result in larger processor overhead, impacting the
performance of virtual machine.
3) overhead of management
Operating as a virtual machine, Domain0 needs VMM to provide appropriate
virtual machine managing interface, such as: creation, resource allocation,
scheduling, and destruction, etc., the resulting administrative overhead.
Domain0, as the main provider of device access, its function is relatively
fixed and administrative overhead should be avoided to reduce the burden on
VMM.
4) scheduling Delay
Domain0 and other virtual machines take part in VMM scheduling, due to
scheduling rotation characteristics, Domain0 can not guarantee timely
delivery of services, which results a number of related issues. First, after
VMM receive IO request from virtual machine, Domain0 could not be
immediately notice, only asynchronous notice way which similar to soft
interrupt can be used, and Domian0 will test and process it when running.
Second, device model of Domain0 is provided by Qemu, which is running as a
process of guest OS. When Domain0 is not running, Qemu can not handle IO
requests from virtual machine, resulting in delay of processing IO requests.
Third, other virtual machine scheduling depends on virtual clock interrupts,
Domian0 simulation of virtual clock will lead to problems of virtual clock
synchronization, virtual machine scheduling, and clock synchronization
between the virtual multi-core (currently the realization of virtual clock
has migrated from Domain0 to VMM).
5) IOPM bottleneck
In multiple virtual machines running case, the resulting IO request will be
quite frequently, because Domain0 is the only IOPM (IO process machine) of
entire system, and all IO requests will be handled through Domain0, forming
the IOPM bottleneck. For further considerations, if one IOPM fails, and if
it cannot be replaced timely by alternative IOPM, entire system can only be
restarted, resulting in delays or even collapse of services of virtual
machine.
Main cause of Domain0 related problems mentioned above are that IOPM is
virtualized, acting as a subsidiary module of VMM. Because the nature role
of Domain0 is providing services of accessing equipment to VMM, a possible
solution is: under the premise that Domain0 provides services to VMM, to
achieve IOPM thoroughly separated from VMM. From four aspects:
Weakening of VMM and Domain0 coupling to increase the independence of VMM
design.
 Reducing VMM interference to Domain0 to give Domain0 the right to operate
independently.
 Establishing interact between VMM and Domain0 to ensure that Domain0
provide device access services to VMM.
 Providing multiple IOPM to achieve load balance.
In accordance with the above considerations, operating system does not need
to be modified too much to implement IOPM, IOPM interacts with VMM with only
a small number of interfaces. From the way of controlling hardware resources
directly, IOPM converts from subsidiary module of VMM into cooperation
module of VMM. The cooperation model of VMM discussed below achieves and
verifies the above-mentioned IOPM.

Part2 Cooperative model VMM

A. Cooperative model description
With the popularity of multi-core processors and of large-capacity memory,
hardware resources of PC machine are no longer scarce. In the 60''s of
last
century, IBM S/360 mainframe used hardware partition approach to implement
virtualization, providing a useful inspiration for the current PC platform
virtualization.
For the problems of IOPM virtualization and coupling tightly with VMM in
Hybrid model, method of hardware division can be used to make IOPM control a
part of hardware resources directly, converting from virtual machine to
privileged machine, forming structure of IOPM and VMM cooperative. Main
control system consists of two parts: VMM which implement processor and
memory virtualizations, and IOPM which controls peripherals and provides
device model. More than one IOPM can exist, and each IOPM control an AP,
while VMM controls BSP and the rest of APs, as shown in Fig 5. Cooperative
model has the following characteristics:
 Elimination of tight coupling between VMM and IOPM, which interact through
only a handful of interfaces.
 Independence of IOPM from VMM monitoring and scheduling.
 Multiple IOPM parallel for load balance and failure replacement
http://xen.1045712.n5.nabble.com/file/n4418793/1.bmp
Figure 5. Structure of cooperative VMM
B. Interrupt handling
1) IOPM controls right of interrupt reception
Assume that device interrupt is submitted directly to IOPM, it looks like
that device access path of
IOPM is shortened, as shown in Fig 6.
http://xen.1045712.n5.nabble.com/file/n4418793/2.bmp
Figure 6. IOPM controls right of interrupt
reception
In this way, IOPM has the rights of external interrupt reception and
processing at the same time, but consider the following three situations:
 IOPM contains a large number of device drivers, whose stability will
affect the security of IOPM and whole system. Suppose that IOPM fails due to
device driver failure, consequences result is that corresponding device
interrupted can not be responded so that virtual machine IO requests can not
be processed.
 In some cases, a small amount special device drivers are need to be
integrated into VMM, then IO requests can be handled within VMM without
delivering to IOPM, thereby enhancing efficiency of devices access, such as
certain interrupt high frequency devices (clock, net card, etc.).
 To enhance the stability of whole system, hoping driver can be distributed
across multiple IOPM, to prevent collapse of entire system caused by a
single IOPM failure. In this case, VMM needs to control right of interrupt
reception, and submit the interruption to other IOPM.
Above analysis shows that, right of interrupt reception controlled by IOPM
has a big problem, interrupt reception and interrupt handling need to be
separate: VMM receive interrupts, while IOPM handling interrupts,
controlling of right of interrupt reception by VMM can achieve equipment
control at minimal expense.

2) VMM controls right of interrupt reception
To solve these problems of IOPM control right of interrupt reception,
interrupt handling can be improved as follows: External interrupt submitted
to VMM firstly, VMM providing interrupt routing function, routed interrupt
to appropriate IOPMs. External interrupt first submitted to the VMM,
depending on actual circumstances, VMM can handle directly, or submit to an
IOPM, as shown in Fig 7.
http://xen.1045712.n5.nabble.com/file/n4418793/3.bmp
Figure 7. VMM controls right of interrupt
reception
The improved VMM has the following characteristics in device processing:
 Interruption is received and routed by VMM to improve flexibility of
interrupt handling.
 VMM integrates directly some of the key device drivers to shorten device
access path.
 Device drivers are distributed in multiple IOPM to achieve load balance
and failure replacement.

Part3 Model implementation

Implementation of cooperative VMM require division of hardware resources
which can eliminating control conflict of hardware between VMM and IOPM. On
this basis, appropriate operating system will be selected and transformation
to IOPM. Currently, the realization of this model is based on the
dual-processor platform with Intel VT-x, and the IOPM is based on Linux.
A. Hardware division
Hardware division among IOPM and VMM as shown in Table 1.
TABLE 1. HARDWARE DIVISION BETWEEN IOPM AND
VMM
http://xen.1045712.n5.nabble.com/file/n4418793/4.bmp
1) Processor
IOPM controls a single processor, can not be used for
multi-processor-related operations. BSP need to be run first after starting
of machine and controlled by VMM, VMM then can start AP and running IOPM at
an appropriate time in order to make the VMM and IOPM running paralleled.
2) Memory
Physical memory is controlled with subarea by VMM and IOPM, but data can
interact through shared memory.
3) IOAPIC
External interruption must first submit to BSP in which VMM is located, the
decision of handling interruption will be made by VMM.
4) Clock
Both VMM and IOPM require scheduling of its internal program. Since
scheduling and clock interrupts are related, clock interrupt will need to be
submitted to the VMM and IOPM at the same time.
5) IO Device
IO device is controlled by IOPM, IO request of the Virtual Machine will be
submitted to IOPM through VMM, accessing of device is achieved with help of
its device driver.
B. IOPM Implementation
Implementation of IOPM involves four aspects:
1) Boot IOPM
In traditional, Linux is load by boot loader, for example grub, Linux kernel
code is divided into two parts, real mode and protected mode. According to
Linux boot protocol, real mode code is required to be copied to a space
which below 1M by bootloader and bootloader parse kernel header information
in order to cope protected mode code to specified location. Boot loader then
jump to location of real mode code and operating system will take control of
machine.
Boot IOPM by VMM also needs to simulate this flow, Linux real mode code
will be copied to a free space which below 1M. In traditional, protected
mode code is located in 1M, which has been occupied by VMM. Therefore,
protected mode code is copied to another security zones. VMM boot AP
processors after completion of layout of IOPM code, it needs to switch to
real mode before the execution of IOPM by AP, and then jump to the starting
address of the real mode code. The flow is shown in Fig 8.
http://xen.1045712.n5.nabble.com/file/n4418793/5.bmp
Figure 8. Flow of booting
IOPM
2) Physical memory isolation
In order to achieve spacial address isolation and data exchange between VMM
and IOPM, entire physical memory is divided into three parts: VMM management
zone, IOPM Management zone, and shared zone. Management zones involved in
the dynamic allocation and recovery of memory manager, sharing zone can only
be accessed but not participate in allocation, division of physical memory
and its property as shown in Fig 9.
http://xen.1045712.n5.nabble.com/file/n4418793/6.bmp
Figure 9. division of physical memory and its
property
3) Communications between VMM and IOPM
VMM and IOPM generally communicate under two conditions: First of all, IO
requests issued by virtual machine captured by VMM and submit to IOPM, IOPM
then return the processing results to VMM. Secondly, user issues a request
to VMM through user interface which provided by IOPM to complete the virtual
machine operation. Communication mechanism built on IPIs and shared memory,
IPIs is used for message notification between IOPM and VMM, shared memory is
used for temporary storage of interactive data.
3)Shared memory
Shared memory is used for temporary storage of interactive data between VMM
and IOPM. In order to prevent buffer overflow, organizations of shared
memory is required. The shared memory is divided into four parts:
VMM-controlled area, IOPM-controlled area, VMM data area, IOPM data area.
The public control pointer which store in controlled area is used to operate
data package in data area. Data area is organized in form of ring: VMM data
area is used for temporary storage of data package from VMM to IOPM, IOPM
data area is used for temporary storage data package from IOPM to VMM.
Others
…..

--
View this message in context:
http://xen.1045712.n5.nabble.com/Re-design-the-architecture-of-Xen-tp4418793p4418793.html
Sent from the Xen - Dev mailing list archive at Nabble.com.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Konrad Rzeszutek Wilk

2011-May-24 14:24 UTC

head link

Re: [Xen-devel] Re-design the architecture of Xen

On Mon, May 23, 2011 at 04:39:37AM -0700, henanwxr
wrote:> http://xen.1045712.n5.nabble.com/file/n4418793/6.bmp We have researched
> virtualization for several years, with the reference of Xen, we have design
> a new VMM architecture called Cooperative model VMM，and have implemented a
> prototype system.
> We present its principle and part of details here.
> 
> 
> Part1 motivation
> 
> 
> B. Domain0 problems
> Domain0 has several features: 
Features or disadvantages?> 	Running modified operating system. 
What does ''modified'' mean?
> 	Running on processor with privilege level 1 
> 	Running in a form of virtual machine
> 	Single system managing hardware
Right, but that does not have to be the case..> These features of Domain0 bring the following issues:
> 1) tight coupling
> >From a performance point of view, the coordination of Domain0 and VMM
(such
> as: hypercall), event channel and IO ring can improve virtualization
> efficiency, which, however, requires more modification of guest operating
> system. Also, VMM needs to provide the corresponding interface. The tight
I am still lost what you mean by ''more modification'' ?
> coupling formed between Domain0 and VMM results that VMM implementations
> must take third-party system characteristics into account, design is lack
of
such as?> independence and flexibility. 
> 2) privilege level switch
> Domain0 is running on the processor with privilege level 1, context switch
Not neccesarily.
> from the VMM to Domain0 will trigger processor privilege level switches. If
> operation of this type is more frequent (such as IO request operation for a
> virtual machine), it will result in larger processor overhead, impacting
the
I think you are referring to sysctl. That can be eliminated by having
a 32-bit OS.
> performance of virtual machine.
> 3) overhead of management
> Operating as a virtual machine, Domain0 needs VMM to provide appropriate
> virtual machine managing interface, such as: creation, resource allocation,
> scheduling, and destruction, etc., the resulting administrative overhead.
> Domain0, as the main provider of device access, its function is relatively
> fixed and administrative overhead should be avoided to reduce the burden on
> VMM. 
So.. remove the administration from Dom0. But why? What are the 
disadvantages of doing this in Dom0?
> 4) scheduling Delay 
> Domain0 and other virtual machines take part in VMM scheduling, due to
> scheduling rotation characteristics, Domain0 can not guarantee timely
> delivery of services, which results a number of related issues. First,
after
> VMM receive IO request from virtual machine, Domain0 could not be
> immediately notice, only asynchronous notice way which similar to soft
> interrupt can be used, and Domian0 will test and process it when running.
> Second, device model of Domain0 is provided by Qemu, which is running as a
> process of guest OS. When Domain0 is not running, Qemu can not handle IO
> requests from virtual machine, resulting in delay of processing IO
requests.
If you are using legacy hardware in QEMU - sure. But nowadays every Linux
distro has drivers to use the PV drivers which omit QEMU. Also they are
available under Windows (even WHQL certified ones).
Furtheremore the stub-domains eliminate this.

Anyhow, I stopped reading after this..

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Samuel Thibault

2011-May-24 14:42 UTC

head link

Re: [Xen-devel] Re-design the architecture of Xen

Err, aren''t you simply describing something similar to domDs, the
Driver
domains that dom0 disaggregation plans already imagined?  I don''t know
how far it is currently finished, but there is not so much redesign
needed: domUs frontends simply have to talk to domDs backends instead of
dom0 (not so big overhaul, mostly device paths in xenstore and details),
domDs being allowed to drive hardware directly independently from each
other (can be done through VT-d).

Samuel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - May 2011 - Re-design the architecture of Xen

[Xen-devel] Re-design the architecture of Xen

Re: [Xen-devel] Re-design the architecture of Xen

Re: [Xen-devel] Re-design the architecture of Xen