Here's an initial cut at an Ovirt Host task list. After we refine this a little with some discussion, I'll prioritize these. One thing that complicates the design is supporting static addressing of Ovirt hosts. Making the assumption that DHCP can be used and configured for Ovirt makes things much simpler. I won't make the DHCP assumption below, but would appreciate people's thoughts on whether or not static addressing support is important. 1. Image Creation: Right now the Ovirt Host is created using scripts in the Ovirt host repository that utilize livecd creator to make ISO, USB and PXE formats. I think it makes sense to start with the tools that Chris L. has already been working with and migrate new functionality in, rather than starting with other tools. a. Integrate whitelist functionality into the image creation process to reduce image size. b. Determine the minimal set of files that the host image needs to construct a baseline whitelist. c. Determine minimal set of kernel modules to support wide platform base (i.e. keep disk, net modules. remove things like sound, isdn, etc...) d. Construct a repeatable build process so that images can be built daily for testing. e. Construct a utility that detects the minimal set of kernel modules for a given platform and generates a config file that can be fed to the build system for generating hardware platform specific images. f. Construct a method for taking the generic host image and removing all kernel modules not applicable for a specific platform. g. Construct a method for taking an image and adding site specific configuration information to it (IP addresses of Ovirt Mgmt Server and FreeIPA Server). This is only necessary to support hosts that require static IP configuration -and- do not have persistent storage to store configuration information. (If we decide to support static IP config we could make presence of persistent storage a requirement...) The goal is to get disk footprint down to something like 32, 64 or 128MB. The base images we build will not be hardware platform specific, so 32MB might be a long shot. But after step e) above, we should fit into 32MB. 2. Installation We have to account for two methods of distributing Ovirt: pre-installation by OEMs and install by IT admins. Pre-install by OEM implies that the platform has flash or a reserved partition on the primary hard disk. Install by IT can be done by setting up PXE boot, adding an external bootable USB disk, booting to CD or installing to onboard flash/disk. The livecd/liveusb disk should allow either booting and running directly or installation onto platform flash/disk. 3. Persistent Configuration Storage Since the host is non-persistent and may or may not have access to persistent storage, this could become tricky. It's further complicated by the need to support DHCP and static addressing of the host. The following things need to be stored for the host: a. Kerberos keytab or SSL Certificate b. IP configuration (if using static configuration) c. IP address of FreeIPA server and Ovirt Management Server (if using static configuration) Here's where things could be much simpler if DHCP were required. Without static addressing, b is not necessary and c can be transmitted as custom DHCP fields (iirc). Then the only thing that needs to be cached on the host on persistent storage is the auth credentials. (NOTE: Does Microsoft's DHCP server allow custom DHCP fields???) OEM Pre-Install (will always have flash/disk storage) DHCP Addressing Flash/Disk Storage/TPM Auth credentials can be stored in a file or in TPM Static Addressing Flash/Disk Storage/TPM Auth credentials/IP config/Server IPs can be stored in a file If TPM is present, can be used to store auth info IT Install (may not have flash or disk) DHCP Addressing Flash/Disk Storage/TPM Auth credentials can be stored in a file or in TPM No Persistent Storage (CD or PXE Boot) Auth can be stored in host image (separate image for each host) (NOTE: Not secure for PXE boot. Recommend that we only allow CD boot in this case) Static Addressing Flash/Disk Storage/TPM Auth credentials/IP config/Server IPs can be stored in a file If TPM is present, can be used to store auth info No Persistent Storage (CD or PXE Boot) Auth/IP config/Server IPs can be stored in host image (separate image for each host) (NOTE: Not secure for PXE boot as above) 4. Initial Configuration When the host comes up the first time it needs to know the IP address and password of the Ovirt Mgmt server so it can get its auth credentials set up. Here's how a typical setup might look: a. Cable up the new hardware and get MAC addrs for each iface. Need to record which MAC is for: * management network (NOTE: should be only iface that PXE boots if PXE is being used) * storage network * normal networks (these all don't have to be separate, there could be just a single interface on the box used for management, storage and normal traffic) b. Set up DNS/DHCP servers. If using static addressing, skip DHCP setup (This is a manual step that IT admin would do, i.e. we're not trying to automate this process) c. Boot Ovirt Host for first time. Kernel cmdline takes options like: * ovirt_net=static (to indicate that static ip config should be prompted for during boot) * ovirt_auth_pw=<password> (password to use when connecting to the Ovirt Mgmt server to register the host and retrieve auth information) * ovirt_init (this is specified to let the host know that initial config needs to happen. Otherwise normal boot occurs) d. During boot if ovirt_net=static is specified, admin is prompted for IP config for each interface and which interface is the mgmt interface. This can be done with an anaconda style interface. e. Auth Setup: * If ovirt_auth_pw is not specified: Auth information is expected to be either on the host disk image or on a connected USB key. (could be that the Ovirt host image is on internal flash/disk and admin supplies auth info on a USB thumb drive) * If ovirt_auth_pw is specified: After network comes up, mgmt server is contacted and the host is registered using the supplied password. The auth information is sent back to the Ovirt Host. f. The host is checked for persistent storage and TPM devices. If TPM is found it is initialized so that auth info can be stored there. * If either USB or hard disk is found it is checked for a partition with a label called "OVIRT". If that exists it is used for persistent storage. If it does not, an "OVIRT" partition is created. on available unpartitioned space. (how big should it be?) * USB/disk is also checked for swap partition. If none is found, a swap partition is created using unpartitioned space. g. Host stores auth/ip information on persistent storage/TPM for future boots. 5. KVM Work * Need to work on support for Live Migration * Need to make sure that power management and suspend/resume of guests work as expected 6. Storage Support * libvirt storage API to support fibre channel * Work on using LVM on top of iSCSI so that we don't need to have a 1-1 mapping between LUNs and Guests * libvirt already supports NFS, just need to put plumbing in place so that Ovirt also supports NFS. (This is more of an issue for the management interface, not the Ovirt Host) 7. Performance Monitoring/Auditing/Health * collectd is used presently to send performance stats to management server. Is this the right solution? I don't have a better one, but suggestions are appreciated. * libvirt talks to auditd to send audit info to FreeIPA server * Need to create a health monitoring daemon that communicates to mgmt server: - Sends heartbeat at configurable interval - Sends status changes of host (including machine check exceptions, and other errors) - Sends VM state changes 8. Clustering I'm light on clustering knowledge, so fill me in if I'm talking crazy nonsense... Two types of clustering: physical machines and virtual machines Physical clustering is not particularly useful in and of itself, since the whole point of what we're trying to do is abstract the applications from hardware further. Clustering of guests should be done. The hardware fence would be replaced by a paravirt driver in the guest that communicates to the host. If two machines A and B are clustered and on different physical hardware, A is the primary and B is backup. B monitors the application on A. If A appears to go down, B uses the PV driver to tell the host that it needs to kill A. The host tells libvirt that A needs to be destroyed, and that command is sent over libvirt comms to the physical host that is hosting A. If the physical hardware hosting A is down, that libvirt command won't succeed. So in this case, a hardware fence would be required to make sure that the physical host is disconnected and does not power on again. This probably needs a lot of refinement. So someone who knows clustering better than I do, should feel free to help out here... 9. Image Updates Host images should be updated as binary blobs. (initially a complete overwrite, and eventually perhaps a binary diff) Updating can be done in a few ways: a. For machines that PXE, this is simply putting a new PXE image in place. b. For machines that boot from CD a new livecd needs to be created and physically put in machines c. For machines that boot from USB/hard disk an update daemon can run on the host that polls the mgmt server for new images. When a new image is found it is retrieved and stored on the persistent disk and grub is updated to boot to it. If boot fails of the new image, the next boot falls back to the old image. (This means that persistent storage should be 2 times the size of the compressed image size + some additional room for config) That's a mouthful. Thoughts and criticisms appreciated. Perry -- |=- Red Hat, Engineering, Emerging Technologies, Boston -=| |=- Email: pmyers at redhat.com -=| |=- Office: +1 412 474 3552 Mobile: +1 703 362 9622 -=| |=- GnuPG: E65E4F3D 88F9 F1C9 C2F3 1303 01FE 817C C5D2 8B91 E65E 4F3D -=|
Some things I left off that are important...> 5. KVM Work > > * Need to work on support for Live Migration > * Need to make sure that power management and suspend/resume of guests > work as expected* Ballooning memory support * PV Drivers (disk and net) including drivers for Windows Perry
On Sun, Mar 16, 2008 at 11:43:02PM -0400, Perry N. Myers wrote:> 1. Image Creation: > > a. Integrate whitelist functionality into the image creation > process to reduce image size. > b. Determine the minimal set of files that the host image needs to > construct a baseline whitelist.IMHO, a pure whitelist approach is not very maintainable long term. As we track new Fedora RPM updates, new files often appear & we'll miss them with a pure whitelist. I'd go for a combined white/black list so we can be broadly inclusive and only do fine grained whitelist in specific areas> c. Determine minimal set of kernel modules to support wide platform base > (i.e. keep disk, net modules. remove things like sound, isdn, etc...) > d. Construct a repeatable build process so that images can be built > daily for testing. > e. Construct a utility that detects the minimal set of kernel modules > for a given platform and generates a config file that can be fed to > the build system for generating hardware platform specific images. > f. Construct a method for taking the generic host image and removing all > kernel modules not applicable for a specific platform.I'd put e) & f) fairly low on the list, unless someone has a compelling short term need to make a *tiny* embedded image for specific hardware.> g. Construct a method for taking an image and adding site specific > configuration information to it (IP addresses of Ovirt Mgmt Server and > FreeIPA Server). This is only necessary to support hosts that require > static IP configuration -and- do not have persistent storage to store > configuration information. (If we decide to support static IP config > we could make presence of persistent storage a requirement...) > > The goal is to get disk footprint down to something like 32, 64 or 128MB. > The base images we build will not be hardware platform specific, so 32MB > might be a long shot. But after step e) above, we should fit into 32MB.64MB is probably a worthy immediate goal to aim for given that the current live image is about 85 MB & we know there's stuff that can be killed. Again, unless we have compelling hardware use case that requires 32 MB, I'd not bother aiming for the 32MB mark in the short-medium term. There's other more broadly useful stuff we can do....> OEM Pre-Install (will always have flash/disk storage) > DHCP Addressing > Flash/Disk Storage/TPM > Auth credentials can be stored in a file or in TPM > Static Addressing > Flash/Disk Storage/TPM > Auth credentials/IP config/Server IPs can be stored in a file > If TPM is present, can be used to store auth info > IT Install (may not have flash or disk) > DHCP Addressing > Flash/Disk Storage/TPM > Auth credentials can be stored in a file or in TPM > No Persistent Storage (CD or PXE Boot) > Auth can be stored in host image (separate image for each host) > (NOTE: Not secure for PXE boot. Recommend that we only allow > CD boot in this case) > Static Addressing > Flash/Disk Storage/TPM > Auth credentials/IP config/Server IPs can be stored in a file > If TPM is present, can be used to store auth info > No Persistent Storage (CD or PXE Boot) > Auth/IP config/Server IPs can be stored in host image (separate > image for each host) > (NOTE: Not secure for PXE boot as above)The 2 PXE boot methods can be secure given a deployment scenario where the network admin has a dedicated PXE management LAN for their hosts. So PXE traffic would be separated from general guest / user traffic. Its probably not viable to mandate separate 'management network' but we should document it as a deployment scenario.> 5. KVM Work > > * Need to work on support for Live Migration > * Need to make sure that power management and suspend/resume of guests > work as expected* dmidecode support so we can expose UUID to guest OS * inter-VM socket family. Like AF_UNIX, but between guests (or perhaps just between guest & host). Call it AF_VIRT or some such. Want stream based protocol, but optionally datagram too. Some use cases for this: - Fast channel to feed performance stats from guest OS to host - Used to build the cluster fence agent - Other... ?> 7. Performance Monitoring/Auditing/Health > > * collectd is used presently to send performance stats to management > server. Is this the right solution? I don't have a better one, but > suggestions are appreciated.collectd is the best option I know of for flexible, lightweight monitoring> * libvirt talks to auditd to send audit info to FreeIPA serverTo clarify, there's no explicit depdancy chain here. Libvirt would simply send data to the audit daemon. The audit daemon can optionally be configured to ship data off to FreIPA.> * Need to create a health monitoring daemon that communicates to mgmt > server: > - Sends heartbeat at configurable interval > - Sends status changes of host (including machine check exceptions, > and other errors) > - Sends VM state changesThis last point should arguably be supported by libvirt directly.> 8. Clustering > > Physical clustering is not particularly useful in and of itself, since the > whole point of what we're trying to do is abstract the applications from > hardware further.You fundamentally need at least the fencing part of clustering on the physical hosts. Without this you have no guarentee that the host & guests that were running on it are actually dead. You need this guarentee before starting the guests on a new host.> Clustering of guests should be done. The hardware fence would be replaced > by a paravirt driver in the guest that communicates to the host. If two > machines A and B are clustered and on different physical hardware, A is > the primary and B is backup. B monitors the application on A. If A > appears to go down, B uses the PV driver to tell the host that it needs to > kill A. The host tells libvirt that A needs to be destroyed, and that > command is sent over libvirt comms to the physical host that is hosting A.The basic goal here is that the oVirt host should provide a general purpose virtual fence device to all guests. The guest admin decides which of their guests are in the same cluster & does some config task with the driver to set this up. Since oVirt knows what VMs each user owns, it can validate the config to ensure the admin isn't trying to cluster someone else's VMs. THe driver will then provide the guest admin the ability to have one guest shoot another guest in the head securely. The key is there must be *zero* need for the guest admin to actually interact with the host admin. The host image should support this all automatically, so guest admin's can setup clustering in their guest at will. Dan. -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
On Sun, Mar 16, 2008 at 11:43:02PM -0400, Perry N. Myers wrote: Wow that's a big mail <g>. I have a couple of general comments based on the way we *have been* thinking about this stuff (this of course doesn't mean we can't change). First off, we have been imagining the oVirt host image as stateless except for whatever we use to store its identity (if we even do that). This means we simply don't support static addressing of oVirt hosts, at all. The appealing thing about a stateless oVirt host image to me is that it means the host is immediately and easily upgradeable; there is 0 local state to worry about aside from the Kerberos keytab. It also means that a reboot will immediately restore a compromised machine FWIW. Having said all that, if there is a compelling reason to move away from the stateless idea I'm all ears. Further comments in-line below...> > 1. Image Creation: > > Right now the Ovirt Host is created using scripts in the Ovirt host > repository that utilize livecd creator to make ISO, USB and PXE formats. > > I think it makes sense to start with the tools that Chris L. has already > been working with and migrate new functionality in, rather than starting > with other tools. > > a. Integrate whitelist functionality into the image creation > process to reduce image size. > b. Determine the minimal set of files that the host image needs to > construct a baseline whitelist. > c. Determine minimal set of kernel modules to support wide platform base > (i.e. keep disk, net modules. remove things like sound, isdn, etc...) > d. Construct a repeatable build process so that images can be built > daily for testing. > e. Construct a utility that detects the minimal set of kernel modules > for a given platform and generates a config file that can be fed to > the build system for generating hardware platform specific images. > f. Construct a method for taking the generic host image and removing all > kernel modules not applicable for a specific platform. > g. Construct a method for taking an image and adding site specific > configuration information to it (IP addresses of Ovirt Mgmt Server and > FreeIPA Server). This is only necessary to support hosts that require > static IP configuration -and- do not have persistent storage to store > configuration information. (If we decide to support static IP config > we could make presence of persistent storage a requirement...)You could add "Kerberos keytab" to "site specific configuration information" and the host image would be completely self-contained. I can imagine a library of unique images under /tftpboot...?> > 2. Installation > > We have to account for two methods of distributing Ovirt: pre-installation > by OEMs and install by IT admins. Pre-install by OEM implies that the > platform has flash or a reserved partition on the primary hard disk. > Install by IT can be done by setting up PXE boot, adding an external > bootable USB disk, booting to CD or installing to onboard flash/disk. > > The livecd/liveusb disk should allow either booting and running directly or > installation onto platform flash/disk.That's a nice addition, yeah. Although again I'm in favor of the running image itself remaining stateless.> 3. Persistent Configuration Storage > > Since the host is non-persistent and may or may not have access to > persistent storage, this could become tricky. It's further complicated by > the need to support DHCP and static addressing of the host. > > The following things need to be stored for the host: > a. Kerberos keytab or SSL Certificate > b. IP configuration (if using static configuration) > c. IP address of FreeIPA server and Ovirt Management Server (if using > static configuration)> Here's where things could be much simpler if DHCP were required. Without > static addressing, b is not necessary and c can be transmitted as custom > DHCP fields (iirc). Then the only thing that needs to be cached on the > host on persistent storage is the auth credentials. (NOTE: Does > Microsoft's DHCP server allow custom DHCP fields???)Actually we've talked about moving away from overloading DHCP for this stuff and instead handling it with mdns (let zeroconf be zeroconf). There is another interesting possibility we might want to look at in the medium term. We are actively working on ways to use amqp as part of our infrastructure. If a machine has an identity via a keytab and an IP address via DHCP, I imagine it's possible to retrieve the remaining necessary config information by sending a message to a known alias? This relieves the oVirt host of having to know the precise address of the management server. Something to think about at least.> 4. Initial Configuration > > When the host comes up the first time it needs to know the IP address and > password of the Ovirt Mgmt server so it can get its auth credentials set > up. Here's how a typical setup might look: > > a. Cable up the new hardware and get MAC addrs for each iface. Need > to record which MAC is for: > * management network (NOTE: should be only iface that PXE boots > if PXE is being used) > * storage network > * normal networks > (these all don't have to be separate, there could be just a single > interface on the box used for management, storage and normal traffic) > b. Set up DNS/DHCP servers. If using static addressing, skip DHCP setup > (This is a manual step that IT admin would do, i.e. we're not trying > to automate this process) > c. Boot Ovirt Host for first time. Kernel cmdline takes options like: > * ovirt_net=static (to indicate that static ip config should be > prompted for during boot) > * ovirt_auth_pw=<password> (password to use when connecting to the > Ovirt Mgmt server to register the host and retrieve auth information) > * ovirt_init (this is specified to let the host know that initial > config needs to happen. Otherwise normal boot occurs)Getting back to the stateless idea, I think the host image should always boot exactly the same way and be smart enough to know whether it needs config information or not and, if so, ask for it? Having said that the "password" idea is reasonable except that you would always need to be there to enter the password for a reboot...> f. The host is checked for persistent storage and TPM devices. If TPM > is found it is initialized so that auth info can be stored there. > * If either USB or hard disk is found it is checked for a partition > with a label called "OVIRT". If that exists it is used for > persistent storage. If it does not, an "OVIRT" partition is created. > on available unpartitioned space. (how big should it be?) > * USB/disk is also checked for swap partition. If none is found, a > swap partition is created using unpartitioned space.Hadn't thought about using TPM for the keytab, that's a neat idea. I'm a bit leery of using any local HDD on the machine for it though.> 7. Performance Monitoring/Auditing/Health > > * collectd is used presently to send performance stats to management > server. Is this the right solution? I don't have a better one, but > suggestions are appreciated.There is some really interesting stuff going on around monitoring with amqp. At a first step we need to try to get collectd hooked into that.> * Need to create a health monitoring daemon that communicates to mgmt > server: > - Sends heartbeat at configurable interval > - Sends status changes of host (including machine check exceptions, > and other errors) > - Sends VM state changesThis is where the amqp stuff might be particularly useful, although we need something in place before July (which is the earliest we can really start looking at AMQP, it sounds like).> 8. Clustering > > This probably needs a lot of refinement. So someone who knows clustering > better than I do, should feel free to help out here...I'm afraid that someone would not be me <g>... Take care, --Hugh