Hello, We, University of Karlsruhe and UNSW/NICTA, have been working on a technique to automate para-virtualization, in the hopes of simplifying the maintenance of the various guest OSs, and would like to share our results to date. The basis of our solution is instruction substitution at the assembler level in order to replace the virtualization-sensitive operations of the guest OS. The virtualization-sensitive operations include instructions and memory accesses (such as to page tables or device registers). In summary: The patch to Linux 2.6 for IA32 is roughly 80 lines, primarily for manual annotations of page table accesses (similar to Linux''s user-access annotations). There are a few additional changes for the build process. The automation relies on a runtime support module, which provides the CPU model, and runs within the address space of the guest Linux. By running within the address space, and by batching virtualization state changes, we achieve performance comparable to para-virtualization. The runtime support module is mostly guest OS independent since the virtualization is at the ISA level; it can support other guest OSs. The instruction substitution takes place at OS boot, which permits us to use a single OS binary for bare metal (including VT and VMware), and any supported hypervisor, such as Xen and our L4 microkernel. To provide rewrite space for the instructions, the OS binary is prepared with NOP scratch space. Our current research is to enable run-time migrations between incompatible hypervisors, or between different versions of the same hypervisor, by rewriting the instruction substitutions at time of migration. Additionally, we envisage that one can install a hypervisor underneath an OS which runs on bare metal. We have a high speed network device emulation, for the DP83820 driver, based on the sensitive memory instruction substitution (an additional several line patch to enable manual annotations). If the guest OS uses the DP83820 device, then it has high-speed access to devices running in Dom0. The speed is comparable to using a customized device driver. By using the DP83820 device, a guest OS can migrate between different hypervisors, since the state is encapsulated in a model, and not a driver. Our performance data is so far only from the Netperf benchmark, which uses many of the virtualization-sensitive instructions. In our results, which focus on Xen and L4 with Linux 2.6.9, we see negligible performance differences. Additionally, when running the same OS binary on raw hardware, we see an increase in performance (due to different trace cache behavior). The solution currently works for IA32 and Itanium, but the approach is applicable to other architectures (we are working on Power and ARM). Our current research is to enable complete automation of the process, to avoid any patches to Linux. We have good progress here. We will shortly release the code under a BSD license. -Josh Our initial performance data: Test system: 2.8 GHz P4 Prescott with 256MB RAM for each VM Guest OS: Linux 2.6.9 configured for XT-PIC Client system: 1.4 GHz P4 Connection: Intel gigabit, with the e1000 driver, via a gigabit switch Netperf with 256K socket buffer, 1GB data transferred. Description: send/receive (Mbit/s) Annotated Linux on raw hardware: 834/712 Native Linux on raw hardware: 827/713 Annotated Linux on Xen: 834/711 XenoLinux: 830/711 Annotated Linux on L4: 830/709 L4Ka::Linux: 775/712 Annotated Linux with DP83820 model on L4: 771/707 L4Ka::Linux with custom network driver on L4: 772/708 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Joshua LeVasseur wrote:>Hello, > >Hi Joshua,>We, University of Karlsruhe and UNSW/NICTA, have been working on a technique >to automate para-virtualization, in the hopes of simplifying the maintenance >of the various guest OSs, and would like to share our results to date. > >Very interesting...>The basis of our solution is instruction substitution at the assembler level >in order to replace the virtualization-sensitive operations of the guest OS. >The virtualization-sensitive operations include instructions and memory >accesses (such as to page tables or device registers). > >So you annotate access to mmio or page tables? Was this for performance or was it not possible to emulate mmio operations and trap writes to the page table? If you were able to avoid having to annotate these things, I presume you could virtualize Linux with no modifications? Even if this resulted in performance degradation, I can imagine scenarios where having this option would be very useful (especially for supporting legacy distributions).>annotations). There are a few additional changes for the build process. > >Would it be possible to package your tools as a cross-compiler environment so that all you had to do is set CROSS_COMPILE appropriately?>Our current research is to enable run-time migrations between incompatible >hypervisors, or between different versions of the same hypervisor, by >rewriting the instruction substitutions at time of migration. Additionally, >we envisage that one can install a hypervisor underneath an OS which runs on >bare metal. > >Does this mean that you maintain the patch table and support unpatching a patched image or do you simply keep a copy of the unpatched kernel? Any thoughts on supporting kernel modules? Would you have to prepatch a module?>We have a high speed network device emulation, for the DP83820 driver, based >on the sensitive memory instruction substitution (an additional several line >patch to enable manual annotations). If the guest OS uses the DP83820 >device, then it has high-speed access to devices running in Dom0. The speed >is comparable to using a customized device driver. By using the DP83820 >device, a guest OS can migrate between different hypervisors, since the >state is encapsulated in a model, and not a driver. > >Have you implemented any other emulated devices?>We will shortly release the code under a BSD license. > >Great! Look forward to seeing the code. Looks like you guys have been doing really cool stuff :-) Regards, Anthony Liguori _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Anthony, On Apr 6, 2005, at 05:56, Anthony Liguori wrote:> Very interesting...Thanks for the good questions and interest.>> The basis of our solution is instruction substitution at the >> assembler level >> in order to replace the virtualization-sensitive operations of the >> guest OS. >> The virtualization-sensitive operations include instructions and >> memory >> accesses (such as to page tables or device registers). >> > So you annotate access to mmio or page tables? Was this for > performance or was it not possible to emulate mmio operations and trap > writes to the page table?We annotate both, but in different annotation domains. We apply the annotations for performance, especially the mmio annotations. The page table references aren''t as critical in many work loads, but if page table activity is expected, then it is probably a necessary optimization.> If you were able to avoid having to annotate these things, I presume > you could virtualize Linux with no modifications? Even if this > resulted in performance degradation, I can imagine scenarios where > having this option would be very useful (especially for supporting > legacy distributions).This is true. It would be possible to avoid the annotations, and to instead rely on traps. Although we prefer to automate these annotations, which is currently our active work. And additionally, it is possible to avoid heavy trapping on the page tables by tracking page table accesses with the reference bits.>> annotations). There are a few additional changes for the build >> process. >> > Would it be possible to package your tools as a cross-compiler > environment so that all you had to do is set CROSS_COMPILE > appropriately?Probably ... we don''t handle 16-bit code right now, so the CROSS_COMPILE solution needs a little work to avoid annotating the 16-bit code. The implication is that the current solution jumps to the 32-bit entry point, and relies on the runtime module preparing "physical" memory with all the guest-specific boot loader information (I guess that this is the "start of day" in Xen terminology).>> Our current research is to enable run-time migrations between >> incompatible >> hypervisors, or between different versions of the same hypervisor, by >> rewriting the instruction substitutions at time of migration. >> Additionally, >> we envisage that one can install a hypervisor underneath an OS which >> runs on >> bare metal. >> > Does this mean that you maintain the patch table and support > unpatching a patched image or do you simply keep a copy of the > unpatched kernel?Either solution works ... just an implementation (or user interface/management) issue.> Any thoughts on supporting kernel modules? Would you have to prepatch > a module?Yes, modules must be supported long term. Modules need to be annotated, and then patched at load time, thus probably requiring collaboration from the OS to announce the installation of a new module. The implication is that 3rd party modules benefit from automated para-virtualization.> Have you implemented any other emulated devices?Basic platform devices. We''re working on high performance disk, and the IO-APIC. Port I/O is particularly nice and easy, especially for ports <= 0xff, and thus the XT-PIC emulation is subject to nice rewriting.> Great! Look forward to seeing the code. Looks like you guys have > been doing really cool stuff :-)Thanks, Josh _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel