http://xen.1045712.n5.nabble.com/file/n4418793/6.bmp We have researched virtualization for several years, with the reference of Xen, we have design a new VMM architecture called Cooperative model VMM,and have implemented a prototype system. We present its principle and part of details here. Part1 motivation B. Domain0 problems Domain0 has several features: Running modified operating system. Running on processor with privilege level 1 Running in a form of virtual machine Single system managing hardware These features of Domain0 bring the following issues: 1) tight coupling>From a performance point of view, the coordination of Domain0 and VMM (suchas: hypercall), event channel and IO ring can improve virtualization efficiency, which, however, requires more modification of guest operating system. Also, VMM needs to provide the corresponding interface. The tight coupling formed between Domain0 and VMM results that VMM implementations must take third-party system characteristics into account, design is lack of independence and flexibility. 2) privilege level switch Domain0 is running on the processor with privilege level 1, context switch from the VMM to Domain0 will trigger processor privilege level switches. If operation of this type is more frequent (such as IO request operation for a virtual machine), it will result in larger processor overhead, impacting the performance of virtual machine. 3) overhead of management Operating as a virtual machine, Domain0 needs VMM to provide appropriate virtual machine managing interface, such as: creation, resource allocation, scheduling, and destruction, etc., the resulting administrative overhead. Domain0, as the main provider of device access, its function is relatively fixed and administrative overhead should be avoided to reduce the burden on VMM. 4) scheduling Delay Domain0 and other virtual machines take part in VMM scheduling, due to scheduling rotation characteristics, Domain0 can not guarantee timely delivery of services, which results a number of related issues. First, after VMM receive IO request from virtual machine, Domain0 could not be immediately notice, only asynchronous notice way which similar to soft interrupt can be used, and Domian0 will test and process it when running. Second, device model of Domain0 is provided by Qemu, which is running as a process of guest OS. When Domain0 is not running, Qemu can not handle IO requests from virtual machine, resulting in delay of processing IO requests. Third, other virtual machine scheduling depends on virtual clock interrupts, Domian0 simulation of virtual clock will lead to problems of virtual clock synchronization, virtual machine scheduling, and clock synchronization between the virtual multi-core (currently the realization of virtual clock has migrated from Domain0 to VMM). 5) IOPM bottleneck In multiple virtual machines running case, the resulting IO request will be quite frequently, because Domain0 is the only IOPM (IO process machine) of entire system, and all IO requests will be handled through Domain0, forming the IOPM bottleneck. For further considerations, if one IOPM fails, and if it cannot be replaced timely by alternative IOPM, entire system can only be restarted, resulting in delays or even collapse of services of virtual machine. Main cause of Domain0 related problems mentioned above are that IOPM is virtualized, acting as a subsidiary module of VMM. Because the nature role of Domain0 is providing services of accessing equipment to VMM, a possible solution is: under the premise that Domain0 provides services to VMM, to achieve IOPM thoroughly separated from VMM. From four aspects: Weakening of VMM and Domain0 coupling to increase the independence of VMM design. Reducing VMM interference to Domain0 to give Domain0 the right to operate independently. Establishing interact between VMM and Domain0 to ensure that Domain0 provide device access services to VMM. Providing multiple IOPM to achieve load balance. In accordance with the above considerations, operating system does not need to be modified too much to implement IOPM, IOPM interacts with VMM with only a small number of interfaces. From the way of controlling hardware resources directly, IOPM converts from subsidiary module of VMM into cooperation module of VMM. The cooperation model of VMM discussed below achieves and verifies the above-mentioned IOPM. Part2 Cooperative model VMM A. Cooperative model description With the popularity of multi-core processors and of large-capacity memory, hardware resources of PC machine are no longer scarce. In the 60''s of last century, IBM S/360 mainframe used hardware partition approach to implement virtualization, providing a useful inspiration for the current PC platform virtualization. For the problems of IOPM virtualization and coupling tightly with VMM in Hybrid model, method of hardware division can be used to make IOPM control a part of hardware resources directly, converting from virtual machine to privileged machine, forming structure of IOPM and VMM cooperative. Main control system consists of two parts: VMM which implement processor and memory virtualizations, and IOPM which controls peripherals and provides device model. More than one IOPM can exist, and each IOPM control an AP, while VMM controls BSP and the rest of APs, as shown in Fig 5. Cooperative model has the following characteristics: Elimination of tight coupling between VMM and IOPM, which interact through only a handful of interfaces. Independence of IOPM from VMM monitoring and scheduling. Multiple IOPM parallel for load balance and failure replacement http://xen.1045712.n5.nabble.com/file/n4418793/1.bmp Figure 5. Structure of cooperative VMM B. Interrupt handling 1) IOPM controls right of interrupt reception Assume that device interrupt is submitted directly to IOPM, it looks like that device access path of IOPM is shortened, as shown in Fig 6. http://xen.1045712.n5.nabble.com/file/n4418793/2.bmp Figure 6. IOPM controls right of interrupt reception In this way, IOPM has the rights of external interrupt reception and processing at the same time, but consider the following three situations: IOPM contains a large number of device drivers, whose stability will affect the security of IOPM and whole system. Suppose that IOPM fails due to device driver failure, consequences result is that corresponding device interrupted can not be responded so that virtual machine IO requests can not be processed. In some cases, a small amount special device drivers are need to be integrated into VMM, then IO requests can be handled within VMM without delivering to IOPM, thereby enhancing efficiency of devices access, such as certain interrupt high frequency devices (clock, net card, etc.). To enhance the stability of whole system, hoping driver can be distributed across multiple IOPM, to prevent collapse of entire system caused by a single IOPM failure. In this case, VMM needs to control right of interrupt reception, and submit the interruption to other IOPM. Above analysis shows that, right of interrupt reception controlled by IOPM has a big problem, interrupt reception and interrupt handling need to be separate: VMM receive interrupts, while IOPM handling interrupts, controlling of right of interrupt reception by VMM can achieve equipment control at minimal expense. 2) VMM controls right of interrupt reception To solve these problems of IOPM control right of interrupt reception, interrupt handling can be improved as follows: External interrupt submitted to VMM firstly, VMM providing interrupt routing function, routed interrupt to appropriate IOPMs. External interrupt first submitted to the VMM, depending on actual circumstances, VMM can handle directly, or submit to an IOPM, as shown in Fig 7. http://xen.1045712.n5.nabble.com/file/n4418793/3.bmp Figure 7. VMM controls right of interrupt reception The improved VMM has the following characteristics in device processing: Interruption is received and routed by VMM to improve flexibility of interrupt handling. VMM integrates directly some of the key device drivers to shorten device access path. Device drivers are distributed in multiple IOPM to achieve load balance and failure replacement. Part3 Model implementation Implementation of cooperative VMM require division of hardware resources which can eliminating control conflict of hardware between VMM and IOPM. On this basis, appropriate operating system will be selected and transformation to IOPM. Currently, the realization of this model is based on the dual-processor platform with Intel VT-x, and the IOPM is based on Linux. A. Hardware division Hardware division among IOPM and VMM as shown in Table 1. TABLE 1. HARDWARE DIVISION BETWEEN IOPM AND VMM http://xen.1045712.n5.nabble.com/file/n4418793/4.bmp 1) Processor IOPM controls a single processor, can not be used for multi-processor-related operations. BSP need to be run first after starting of machine and controlled by VMM, VMM then can start AP and running IOPM at an appropriate time in order to make the VMM and IOPM running paralleled. 2) Memory Physical memory is controlled with subarea by VMM and IOPM, but data can interact through shared memory. 3) IOAPIC External interruption must first submit to BSP in which VMM is located, the decision of handling interruption will be made by VMM. 4) Clock Both VMM and IOPM require scheduling of its internal program. Since scheduling and clock interrupts are related, clock interrupt will need to be submitted to the VMM and IOPM at the same time. 5) IO Device IO device is controlled by IOPM, IO request of the Virtual Machine will be submitted to IOPM through VMM, accessing of device is achieved with help of its device driver. B. IOPM Implementation Implementation of IOPM involves four aspects: 1) Boot IOPM In traditional, Linux is load by boot loader, for example grub, Linux kernel code is divided into two parts, real mode and protected mode. According to Linux boot protocol, real mode code is required to be copied to a space which below 1M by bootloader and bootloader parse kernel header information in order to cope protected mode code to specified location. Boot loader then jump to location of real mode code and operating system will take control of machine. Boot IOPM by VMM also needs to simulate this flow, Linux real mode code will be copied to a free space which below 1M. In traditional, protected mode code is located in 1M, which has been occupied by VMM. Therefore, protected mode code is copied to another security zones. VMM boot AP processors after completion of layout of IOPM code, it needs to switch to real mode before the execution of IOPM by AP, and then jump to the starting address of the real mode code. The flow is shown in Fig 8. http://xen.1045712.n5.nabble.com/file/n4418793/5.bmp Figure 8. Flow of booting IOPM 2) Physical memory isolation In order to achieve spacial address isolation and data exchange between VMM and IOPM, entire physical memory is divided into three parts: VMM management zone, IOPM Management zone, and shared zone. Management zones involved in the dynamic allocation and recovery of memory manager, sharing zone can only be accessed but not participate in allocation, division of physical memory and its property as shown in Fig 9. http://xen.1045712.n5.nabble.com/file/n4418793/6.bmp Figure 9. division of physical memory and its property 3) Communications between VMM and IOPM VMM and IOPM generally communicate under two conditions: First of all, IO requests issued by virtual machine captured by VMM and submit to IOPM, IOPM then return the processing results to VMM. Secondly, user issues a request to VMM through user interface which provided by IOPM to complete the virtual machine operation. Communication mechanism built on IPIs and shared memory, IPIs is used for message notification between IOPM and VMM, shared memory is used for temporary storage of interactive data. 3)Shared memory Shared memory is used for temporary storage of interactive data between VMM and IOPM. In order to prevent buffer overflow, organizations of shared memory is required. The shared memory is divided into four parts: VMM-controlled area, IOPM-controlled area, VMM data area, IOPM data area. The public control pointer which store in controlled area is used to operate data package in data area. Data area is organized in form of ring: VMM data area is used for temporary storage of data package from VMM to IOPM, IOPM data area is used for temporary storage data package from IOPM to VMM. Others ….. -- View this message in context: http://xen.1045712.n5.nabble.com/Re-design-the-architecture-of-Xen-tp4418793p4418793.html Sent from the Xen - Dev mailing list archive at Nabble.com. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Konrad Rzeszutek Wilk
2011-May-24 14:24 UTC
Re: [Xen-devel] Re-design the architecture of Xen
On Mon, May 23, 2011 at 04:39:37AM -0700, henanwxr wrote:> http://xen.1045712.n5.nabble.com/file/n4418793/6.bmp We have researched > virtualization for several years, with the reference of Xen, we have design > a new VMM architecture called Cooperative model VMM,and have implemented a > prototype system. > We present its principle and part of details here. > > > Part1 motivation > > > B. Domain0 problems > Domain0 has several features:Features or disadvantages?> Running modified operating system.What does ''modified'' mean?> Running on processor with privilege level 1 > Running in a form of virtual machine > Single system managing hardwareRight, but that does not have to be the case..> These features of Domain0 bring the following issues: > 1) tight coupling > >From a performance point of view, the coordination of Domain0 and VMM (such > as: hypercall), event channel and IO ring can improve virtualization > efficiency, which, however, requires more modification of guest operating > system. Also, VMM needs to provide the corresponding interface. The tightI am still lost what you mean by ''more modification'' ?> coupling formed between Domain0 and VMM results that VMM implementations > must take third-party system characteristics into account, design is lack ofsuch as?> independence and flexibility. > 2) privilege level switch > Domain0 is running on the processor with privilege level 1, context switchNot neccesarily.> from the VMM to Domain0 will trigger processor privilege level switches. If > operation of this type is more frequent (such as IO request operation for a > virtual machine), it will result in larger processor overhead, impacting theI think you are referring to sysctl. That can be eliminated by having a 32-bit OS.> performance of virtual machine. > 3) overhead of management > Operating as a virtual machine, Domain0 needs VMM to provide appropriate > virtual machine managing interface, such as: creation, resource allocation, > scheduling, and destruction, etc., the resulting administrative overhead. > Domain0, as the main provider of device access, its function is relatively > fixed and administrative overhead should be avoided to reduce the burden on > VMM.So.. remove the administration from Dom0. But why? What are the disadvantages of doing this in Dom0?> 4) scheduling Delay > Domain0 and other virtual machines take part in VMM scheduling, due to > scheduling rotation characteristics, Domain0 can not guarantee timely > delivery of services, which results a number of related issues. First, after > VMM receive IO request from virtual machine, Domain0 could not be > immediately notice, only asynchronous notice way which similar to soft > interrupt can be used, and Domian0 will test and process it when running. > Second, device model of Domain0 is provided by Qemu, which is running as a > process of guest OS. When Domain0 is not running, Qemu can not handle IO > requests from virtual machine, resulting in delay of processing IO requests.If you are using legacy hardware in QEMU - sure. But nowadays every Linux distro has drivers to use the PV drivers which omit QEMU. Also they are available under Windows (even WHQL certified ones). Furtheremore the stub-domains eliminate this. Anyhow, I stopped reading after this.. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Err, aren''t you simply describing something similar to domDs, the Driver domains that dom0 disaggregation plans already imagined? I don''t know how far it is currently finished, but there is not so much redesign needed: domUs frontends simply have to talk to domDs backends instead of dom0 (not so big overhaul, mostly device paths in xenstore and details), domDs being allowed to drive hardware directly independently from each other (can be done through VT-d). Samuel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel