This is the first draft of the use case on "Hotplug for Virtualization". Please review and share your comments. Next week (when Mary is back) we'll post this use case both off the Hotplug SIG webpage and the use case page (mail will be sent to let you know when it's posted). There's also a new terminology page dedicated to virtualization specific terms that are referred to in the use case, please review that page as well http://developer.osdl.org/maryedie/HOTPLUG/VirtTerminology.shtml Thanks for your input. Martine HOTPLUG for VIRTUALIZATION USE CASE --------------------------------- Use Case: Hotplug for Virtualization --------------------------------- Version 0.1 (Draft) Last Modified Date: 08/19/05 Copyright (c) 2005 by The Open Source Development Lab, Inc. Verbatim copying and distribution of this document is permitted in any medium, provided this notice is preserved. Draft copies of this document may not be posted publicly without indicating the draft status. --------------------------------- Table of Contents Description Target Acceptance Participants/Roles Scenarios Dependencies Implementation Notes References --------------------------------- --------------------------------- Description --------------------------------- Independently of the size or number of actual physical systems used, (our concept of virtualization includes using multiple separate systems as the hardware base) virtualization provides an abstraction layer to give the user access to virtual machines that are independent of each other. This allows for higher quality of service resource allocation, increases security through resource isolation, provides transparent resource redirection, leads to better hardware consolidation and ultimately also lowers the management cost. The mapping between physical resource and virtual resource can be achieved through numerous mechanisms which are not the focus of this use case. However to keep our focus on the needs of the majority of the open source community we will limit the scope of this use case to the virtual machine monitors known as Type-I VMM which include Xen, VMWare ESX and Virtual Iron (TM) VFe. For further details on the various types of virtual machine monitors refer to R.P. Goldberg's article (see reference [Goldberg74]). Within the scope of the Type-I VMM the virtual layer can be seen as a combination of the virtual machine monitor and some "special" virtual machines. We will refer to the kernel associated with the VMM as the "VMM kernel". The kernel and OS associated with the Virtual Machine will be called the "guest kernel" and "guest OS" respectively. Our goal here is 1) to determine the role of hotplug in virtualization and 2) to identify the requirements to support the various operations both at the physical and virtualization layer that make use of hotplug. In this document we will use commonly known hotplug specific terminology defined at developer.osdl.org/maryedie/HOTPLUG/Terminology.shtml as well as commonly known virtualization specific terminology defined at developer.osdl.org/maryedie/HOTPLUG/VirtTerminology.shtml --------------------------------- Target Acceptance --------------------------------- When an application is running on a virtual machine it should be able to expect maximum reliability of the system, have access to any resources it needs at any given time without disruption to its execution and be unaffected by any configuration changes happening at the hardware level. Replacement of hardware components and addition and/or removal of physical or virtual components should be totally transparent to the application. In this context components refer to system hardware resources like processors, memory, I/O devices, and nodes which is constituted of any combination of processors, memory and I/O combined as a hotplugable unit. The requirements for hotplug support of virtualization should be independent of the choice of virtualization approach and should be integrated into the mainline kernel. Virtualization will provide a perfect test environment for different hotplug features. This use case will help the Hotplug SIG define the appropriate test scenarios to support virtualization. We can also use this description to do a gap analysis for the hotplug code that is either already in the kernel or is currently being developed. --------------------------------- Participants/Roles --------------------------------- Systems Administrator --- Special class of user that has special privileges on a given system. This is a role held by an individual that acts as the administrator for a system. --- * Application Administrator --- Special class of user that has special privileges for a given application. This is a role held by an individual that acts as the administrator for all aspects of an application. --- * User --- Any user on the system. This is a role that is held by all individuals using a system. The user can interact with the system through the processes associated with applications they are using. Root Users are users who have root privileges. They are typically the system administrator. Privileged users have some of the root user privileges, but not all. They are typically an operations staff member. --------------------------------- Scenarios --------------------------------- There are 4 considered scenarios: ------------------------- 1.Serviceability (hotplug at physical layer) ------------------------- In this sub-case the System Administrator needs the ability to remove/replace failing components. Unfortunately, CPU failures tend to be fatal and usually don't give any warning. Fortunately, they're also very infrequent. Because they are usually fatal, it's likely that you won't be looking at a hot-remove scenario.(Though, if you have a processor failure and remove it while the system is down, the System Administrator needs the option to reboot immediately and hot-add the replacement). In contrast, memory and I/O often give adequate warning, via single-bit or parity errors, that they're failing; thus providing an opportunity to have them replaced before they cause a system failure. ------------------------------------- 2.Capacity management (hotplug at physical layer) ------------------------------------- System Administrators need the option to add more physical resources to the virtualization layer to create larger virtual machines, or to relocate physical resources when balancing hardware resources across multiple virtualization layers to support specific workloads. ------------------------------------------------ 3.Migration of the virtualization layer (hotplug at physical layer) ------------------------------------------------ In this sub-case the System Administrator adds some resources and removes others as a way to migrate the virtualization layer onto different hardware. "Different hardware" may mean completely different hardware, or it could simply mean upgrades of existing hardware. It could also be accomplished on a system that provides hard partitioning. ------------------------------------------------ 4.Virtual resources management (hotplug at the virtualization layer) ------------------------------------------------ This sub-case deals with hot-plug of virtual resources to/from the OS instances in the virtual machines. The reason to do this is for capacity management: giving each virtual machine exactly the resources that it requires to support application workload requirements, and making those resources apparent to the guest OS,while leaving the remainder available for other virtual machines. The need for hot-plug of resources to/from the guest OS(es) depends on how the virtualization layer and the OS(es) interact. However the hotplug features required to support either case should be mainstream. ------------------------------------------------ Workflow for Scenario (1) on Serviceability ------------------------------------------------ This scenario covers the expected succession of events when a component shows signs of failure such as multiple parity errors for memory. We assume that some sort of event log analyzer will detect that a component is displaying signs of possible failure and will either post a message at a predetermined location or send a message to a dedicated thread to take action. The choice of posting a message or automatically taking action to isolate the faulty hardware component from the rest of the system should be done at boot time most likely through a configuration option. In the case where a message is posted it is up to the system administrator to take the initiative of requesting that the component be isolated and eventually replaced using hotplug functionality. A hotplug event with remove action needs to be generated to inform the host kernel that the component needs to be hot-removed. The hotplug event handler also has to notify the virtualization layer that a hardware resource will be eliminated so that the virtual machine(s) that was (were) currently using it get it taken away from their resources or have it substituted by another available equivalent physical resource. ------------------------------------------------ Workflow for Scenario (2) on Capacity management ------------------------------------------------ This scenario covers the expected succession of events when a system administrator decides that the existing physical resources are no longer sufficient to either cover the needs of the current virtual machines or to allow for creation of new virtual machines needed to complete the project. The virtualization layer may or may not provide a management tool as an aid to the system administrator to highlight such needs . Such management tool could also provide a means to communicate with the virtualization layer that a new component needs to be added and that a hotplug event to add that component needs to be generated. ------------------------------------------------ Workflow for Scenario (3) on Migration of the virtualization layer ------------------------------------------------ This scenario covers the expected succession of events when a system administrator decides that a set of existing physical resources must be replaced either in the context of an upgrade or for full replacement of one of the platforms that contributes to the hardware resources. The latter case can of course only occur in a configuration in which the hardware resources are constituted by multiple separate platforms. It is the responsibility of the system administrator to either directly inform the VMM of the request for hot remove of those components or to provide the management tool with the information required to handle the hotplug event. The challenge in this specific scenario is that several components will be removed at the same time. So if the mechanism used to transfer the VMM to other resources can be made aware of the fact that multiple components are being hot-removed the operation may be more efficient and coherent than if each component is removed individually. After the hot-remove of all components was successfully accomplished the physical components are replaced, and the proper actions are taken to initiate to trigger hot-add of those new components. The specifics of how the hot-add is triggered is very dependent of the VMM itself (depends on things such as what the VMM kernel is or if the VMM has its own embedded management tool). ------------------------------------------------ Workflow for Scenario (4) on Virtual resources management ------------------------------------------------ This scenario covers the expected succession of events when redistribution of resources is required to address the current needs of the guest OSs as their workloads vary. An application manager or a priviledged user knowing their application's workload requirements may request to the VMM that specific type/amount of resources be allocated to a given virtual machine. We assume that the guest OS running on the virtual machine can support all types of hotplug events. Also for this use case we will assume that any given CPU that has been hot-added to a virtual machine is fully dedicated to that VM. From the OS's point-of-view the hotplug event will be handled identically as when the OS is running directly on the hardware except that when a resource such as a CPU is removed instead of passing it down to the PAL it will be passed to the VMM. While each specific implementation of virtualization may lead to a different interaction process between the VMM and the OS running on each VM, the mechanism of redistributing the virtual resources from a hotplug point-of-view is the same. --------------------------------- Dependencies --------------------------------- ** An event log analyzer needs to be present in the system to detect when a component shows signs of failure. ** The guest OS support logical hotplugging of all possible components ** Possibly changes will be needed in the system firmware to support the various hotplug operations. --------------------------------- Implementation Notes --------------------------------- In the previous 3 scenarios one has to take into account the possibility that the replacement hardware is of a different type than the original, for example a more recent version of a device or a different speed CPU. The hot-add operations for each component is responsible for handling such event. --------------------------------- References --------------------------------- [Goldberg74] "Survey of Virtual Machines Research", Robert P. Goldberg, IEEE Computer, pp. 34-45, June 1974