thr3ads.net - Ovirt devel - [Ovirt-devel] Ovirt Host Tasks [Mar 2008]

If this information is useful, please help other people find it:
Share via:

Perry N. Myers

2008-Mar-17 03:43 UTC

[Ovirt-devel] Ovirt Host Tasks

Here's an initial cut at an Ovirt Host task list.  After we refine this a 
little with some discussion, I'll prioritize these.

One thing that complicates the design is supporting static addressing of 
Ovirt hosts.  Making the assumption that DHCP can be used and configured 
for Ovirt makes things much simpler.  I won't make the DHCP assumption 
below, but would appreciate people's thoughts on whether or not static 
addressing support is important.

1. Image Creation:

Right now the Ovirt Host is created using scripts in the Ovirt host 
repository that utilize livecd creator to make ISO, USB and PXE formats.

I think it makes sense to start with the tools that Chris L. has already 
been working with and migrate new functionality in, rather than starting 
with other tools.

a. Integrate whitelist functionality into the image creation
    process to reduce image size.
b. Determine the minimal set of files that the host image needs to
    construct a baseline whitelist.
c. Determine minimal set of kernel modules to support wide platform base
    (i.e. keep disk, net modules.  remove things like sound, isdn, etc...)
d. Construct a repeatable build process so that images can be built
    daily for testing.
e. Construct a utility that detects the minimal set of kernel modules
    for a given platform and generates a config file that can be fed to
    the build system for generating hardware platform specific images.
f. Construct a method for taking the generic host image and removing all
    kernel modules not applicable for a specific platform.
g. Construct a method for taking an image and adding site specific
    configuration information to it (IP addresses of Ovirt Mgmt Server and
    FreeIPA Server).  This is only necessary to support hosts that require
    static IP configuration -and- do not have persistent storage to store
    configuration information.  (If we decide to support static IP config
    we could make presence of persistent storage a requirement...)

The goal is to get disk footprint down to something like 32, 64 or 128MB. 
  The base images we build will not be hardware platform specific, so 32MB 
might be a long shot.  But after step e) above, we should fit into 32MB.

2. Installation

We have to account for two methods of distributing Ovirt: pre-installation 
by OEMs and install by IT admins.  Pre-install by OEM implies that the 
platform has flash or a reserved partition on the primary hard disk. 
Install by IT can be done by setting up PXE boot, adding an external 
bootable USB disk, booting to CD or installing to onboard flash/disk.

The livecd/liveusb disk should allow either booting and running directly 
or installation onto platform flash/disk.

3. Persistent Configuration Storage

Since the host is non-persistent and may or may not have access to 
persistent storage, this could become tricky.  It's further complicated by 
the need to support DHCP and static addressing of the host.

The following things need to be stored for the host:
a. Kerberos keytab or SSL Certificate
b. IP configuration (if using static configuration)
c. IP address of FreeIPA server and Ovirt Management Server (if using
    static configuration)

Here's where things could be much simpler if DHCP were required.  Without 
static addressing, b is not necessary and c can be transmitted as custom 
DHCP fields (iirc).  Then the only thing that needs to be cached on the 
host on persistent storage is the auth credentials.  (NOTE: Does 
Microsoft's DHCP server allow custom DHCP fields???)

OEM Pre-Install (will always have flash/disk storage)
   DHCP Addressing
     Flash/Disk Storage/TPM
       Auth credentials can be stored in a file or in TPM
   Static Addressing
     Flash/Disk Storage/TPM
       Auth credentials/IP config/Server IPs can be stored in a file
       If TPM is present, can be used to store auth info
IT Install (may not have flash or disk)
   DHCP Addressing
     Flash/Disk Storage/TPM
       Auth credentials can be stored in a file or in TPM
     No Persistent Storage (CD or PXE Boot)
       Auth can be stored in host image (separate image for each host)
       (NOTE: Not secure for PXE boot.  Recommend that we only allow
       CD boot in this case)
   Static Addressing
     Flash/Disk Storage/TPM
       Auth credentials/IP config/Server IPs can be stored in a file
       If TPM is present, can be used to store auth info
     No Persistent Storage (CD or PXE Boot)
       Auth/IP config/Server IPs can be stored in host image (separate
       image for each host)
       (NOTE: Not secure for PXE boot as above)

4. Initial Configuration

When the host comes up the first time it needs to know the IP address and 
password of the Ovirt Mgmt server so it can get its auth credentials set 
up.  Here's how a typical setup might look:

a. Cable up the new hardware and get MAC addrs for each iface.  Need
    to record which MAC is for:
    * management network (NOTE: should be only iface that PXE boots
      if PXE is being used)
    * storage network
    * normal networks
    (these all don't have to be separate, there could be just a single
    interface on the box used for management, storage and normal traffic)
b. Set up DNS/DHCP servers.  If using static addressing, skip DHCP setup
    (This is a manual step that IT admin would do, i.e. we're not trying
    to automate this process)
c. Boot Ovirt Host for first time.  Kernel cmdline takes options like:
    * ovirt_net=static (to indicate that static ip config should be
      prompted for during boot)
    * ovirt_auth_pw=<password> (password to use when connecting to the
      Ovirt Mgmt server to register the host and retrieve auth information)
    * ovirt_init (this is specified to let the host know that initial
      config needs to happen.  Otherwise normal boot occurs)
d. During boot if ovirt_net=static is specified, admin is prompted for
    IP config for each interface and which interface is the mgmt interface.
    This can be done with an anaconda style interface.
e. Auth Setup:
    * If ovirt_auth_pw is not specified: Auth information is expected to be
      either on the host disk image or on a connected USB key.  (could be
      that the Ovirt host image is on internal flash/disk and admin
      supplies auth info on a USB thumb drive)
    * If ovirt_auth_pw is specified: After network comes up, mgmt server is
      contacted and the host is registered using the supplied password.
      The auth information is sent back to the Ovirt Host.
f. The host is checked for persistent storage and TPM devices.  If TPM
    is found it is initialized so that auth info can be stored there.
    * If either USB or hard disk is found it is checked for a partition
      with a label called "OVIRT".  If that exists it is used for
      persistent storage.  If it does not, an "OVIRT" partition is
created.
      on available unpartitioned space. (how big should it be?)
    * USB/disk is also checked for swap partition.  If none is found, a
      swap partition is created using unpartitioned space.
g. Host stores auth/ip information on persistent storage/TPM for future
    boots.

5. KVM Work

* Need to work on support for Live Migration
* Need to make sure that power management and suspend/resume of guests
   work as expected

6. Storage Support

* libvirt storage API to support fibre channel
* Work on using LVM on top of iSCSI so that we don't need to have a 1-1
   mapping between LUNs and Guests
* libvirt already supports NFS, just need to put plumbing in place so that
   Ovirt also supports NFS.  (This is more of an issue for the management
   interface, not the Ovirt Host)

7. Performance Monitoring/Auditing/Health

* collectd is used presently to send performance stats to management
   server.  Is this the right solution?  I don't have a better one, but
   suggestions are appreciated.
* libvirt talks to auditd to send audit info to FreeIPA server
* Need to create a health monitoring daemon that communicates to mgmt
   server:
   - Sends heartbeat at configurable interval
   - Sends status changes of host (including machine check exceptions,
     and other errors)
   - Sends VM state changes

8. Clustering

I'm light on clustering knowledge, so fill me in if I'm talking crazy 
nonsense...

Two types of clustering: physical machines and virtual machines

Physical clustering is not particularly useful in and of itself, since the 
whole point of what we're trying to do is abstract the applications from 
hardware further.

Clustering of guests should be done.  The hardware fence would be replaced 
by a paravirt driver in the guest that communicates to the host.  If two 
machines A and B are clustered and on different physical hardware, A is 
the primary and B is backup.  B monitors the application on A.  If A 
appears to go down, B uses the PV driver to tell the host that it needs to 
kill A.  The host tells libvirt that A needs to be destroyed, and that 
command is sent over libvirt comms to the physical host that is hosting A.

If the physical hardware hosting A is down, that libvirt command won't 
succeed.  So in this case, a hardware fence would be required to make sure 
that the physical host is disconnected and does not power on again.

This probably needs a lot of refinement.  So someone who knows clustering 
better than I do, should feel free to help out here...

9. Image Updates

Host images should be updated as binary blobs.  (initially a complete 
overwrite, and eventually perhaps a binary diff)

Updating can be done in a few ways:
a. For machines that PXE, this is simply putting a new PXE image in place.
b. For machines that boot from CD a new livecd needs to be created and
    physically put in machines
c. For machines that boot from USB/hard disk an update daemon can run on
    the host that polls the mgmt server for new images.  When a new image
    is found it is retrieved and stored on the persistent disk and grub
    is updated to boot to it.  If boot fails of the new image, the next
    boot falls back to the old image.  (This means that persistent storage
    should be 2 times the size of the compressed image size + some
    additional room for config)

That's a mouthful.  Thoughts and criticisms appreciated.

Perry

-- 
|=-        Red Hat, Engineering, Emerging Technologies, Boston        -=|
|=-                     Email: pmyers at redhat.com                      -=|
|=-         Office: +1 412 474 3552   Mobile: +1 703 362 9622         -=|
|=- GnuPG: E65E4F3D 88F9 F1C9 C2F3 1303 01FE 817C C5D2 8B91 E65E 4F3D -=|

Perry N. Myers

2008-Mar-17 15:20 UTC

head link

[Ovirt-devel] Ovirt Host Tasks

Some things I left off that are important...
> 5. KVM Work
> 
> * Need to work on support for Live Migration
> * Need to make sure that power management and suspend/resume of guests
>   work as expected   * Ballooning memory support
   * PV Drivers (disk and net) including drivers for Windows

Perry

Daniel P. Berrange

2008-Mar-17 15:47 UTC

head link

[Ovirt-devel] Ovirt Host Tasks

On Sun, Mar 16, 2008 at 11:43:02PM -0400, Perry N. Myers
wrote:> 1. Image Creation:
> 
> a. Integrate whitelist functionality into the image creation
>    process to reduce image size.
> b. Determine the minimal set of files that the host image needs to
>    construct a baseline whitelist.
IMHO, a pure whitelist approach is not very maintainable long term. As
we track new Fedora RPM updates, new files often appear & we'll miss
them
with a pure whitelist. I'd go for a combined white/black list so we can 
be broadly inclusive and only do fine grained whitelist in specific areas
> c. Determine minimal set of kernel modules to support wide platform base
>    (i.e. keep disk, net modules.  remove things like sound, isdn, etc...)
> d. Construct a repeatable build process so that images can be built
>    daily for testing.
> e. Construct a utility that detects the minimal set of kernel modules
>    for a given platform and generates a config file that can be fed to
>    the build system for generating hardware platform specific images.
> f. Construct a method for taking the generic host image and removing all
>    kernel modules not applicable for a specific platform.
I'd put e) & f) fairly low on the list, unless someone has a compelling
short term need to make a *tiny* embedded image for specific hardware.
> g. Construct a method for taking an image and adding site specific
>    configuration information to it (IP addresses of Ovirt Mgmt Server and
>    FreeIPA Server).  This is only necessary to support hosts that require
>    static IP configuration -and- do not have persistent storage to store
>    configuration information.  (If we decide to support static IP config
>    we could make presence of persistent storage a requirement...)
> 
> The goal is to get disk footprint down to something like 32, 64 or 128MB. 
>  The base images we build will not be hardware platform specific, so 32MB 
> might be a long shot.  But after step e) above, we should fit into 32MB.
64MB is probably a worthy immediate goal to aim for given that the current 
live image is about 85 MB & we know there's stuff that can be killed. 

Again, unless we have compelling hardware use case that requires 32 MB,
I'd not bother aiming for the 32MB mark in the short-medium term.
There's
other more broadly useful stuff we can do....
> OEM Pre-Install (will always have flash/disk storage)
>   DHCP Addressing
>     Flash/Disk Storage/TPM
>       Auth credentials can be stored in a file or in TPM
>   Static Addressing
>     Flash/Disk Storage/TPM
>       Auth credentials/IP config/Server IPs can be stored in a file
>       If TPM is present, can be used to store auth info
> IT Install (may not have flash or disk)
>   DHCP Addressing
>     Flash/Disk Storage/TPM
>       Auth credentials can be stored in a file or in TPM
>     No Persistent Storage (CD or PXE Boot)
>       Auth can be stored in host image (separate image for each host)
>       (NOTE: Not secure for PXE boot.  Recommend that we only allow
>       CD boot in this case)
>   Static Addressing
>     Flash/Disk Storage/TPM
>       Auth credentials/IP config/Server IPs can be stored in a file
>       If TPM is present, can be used to store auth info
>     No Persistent Storage (CD or PXE Boot)
>       Auth/IP config/Server IPs can be stored in host image (separate
>       image for each host)
>       (NOTE: Not secure for PXE boot as above)
The 2 PXE boot methods can be secure given a deployment scenario where
the network admin has a dedicated PXE management LAN for their hosts.
So PXE traffic would be separated from general guest / user traffic.
Its probably not viable to mandate separate 'management network' but
we should document it as a deployment scenario.
> 5. KVM Work
> 
> * Need to work on support for Live Migration
> * Need to make sure that power management and suspend/resume of guests
>   work as expected
 * dmidecode support so we can expose UUID to guest OS
 * inter-VM socket family. Like AF_UNIX, but between guests (or perhaps
   just between guest & host). Call it AF_VIRT or some such. Want stream
   based protocol, but optionally datagram too. 

   Some use cases for this:
     - Fast channel to feed performance stats from guest OS to host
     - Used to build the cluster fence agent
     - Other... ?
> 7. Performance Monitoring/Auditing/Health
> 
> * collectd is used presently to send performance stats to management
>   server.  Is this the right solution?  I don't have a better one, but
>   suggestions are appreciated.
collectd is the best option I know of for flexible, lightweight monitoring
> * libvirt talks to auditd to send audit info to FreeIPA server
To clarify, there's no explicit depdancy  chain here. Libvirt would simply
send data to the audit daemon. The audit daemon can optionally be configured
to ship data off to FreIPA.
> * Need to create a health monitoring daemon that communicates to mgmt
>   server:
>   - Sends heartbeat at configurable interval
>   - Sends status changes of host (including machine check exceptions,
>     and other errors)
>   - Sends VM state changes
This last point should arguably be supported by libvirt directly.
> 8. Clustering
> 
> Physical clustering is not particularly useful in and of itself, since the 
> whole point of what we're trying to do is abstract the applications
from
> hardware further.
You fundamentally need at least the fencing part of clustering on the physical
hosts. Without this you have no guarentee that the host & guests that were
running on it are actually dead. You need this guarentee before starting the
guests on a new host.
> Clustering of guests should be done.  The hardware fence would be replaced 
> by a paravirt driver in the guest that communicates to the host.  If two 
> machines A and B are clustered and on different physical hardware, A is 
> the primary and B is backup.  B monitors the application on A.  If A 
> appears to go down, B uses the PV driver to tell the host that it needs to 
> kill A.  The host tells libvirt that A needs to be destroyed, and that 
> command is sent over libvirt comms to the physical host that is hosting A.
The basic goal here is that the oVirt host should provide a general purpose
virtual fence device to all guests. The guest admin decides which of their
guests are in the same cluster & does some config task with the driver to
set this up. Since oVirt knows what VMs each user owns, it can validate the
config to ensure the admin isn't trying to cluster someone else's VMs.
THe
driver will then provide the guest admin the ability to have one guest shoot
another guest in the head securely.  The key is there must be *zero* need
for the guest admin to actually interact with the host admin. The host image
should support this all automatically, so guest admin's can setup clustering
in their guest at will.


Dan.
-- 
|: Red Hat, Engineering, Boston   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Hugh O. Brock

2008-Mar-17 16:35 UTC

head link

[Ovirt-devel] Ovirt Host Tasks

On Sun, Mar 16, 2008 at 11:43:02PM -0400, Perry N. Myers wrote:

Wow that's a big mail <g>.

I have a couple of general comments based on the way we *have been*
thinking about this stuff (this of course doesn't mean we can't
change). 

First off, we have been imagining the oVirt host image as stateless
except for whatever we use to store its identity (if we even do
that). This means we simply don't support static addressing of oVirt
hosts, at all. The appealing thing about a stateless oVirt host image
to me is that it means the host is immediately and easily upgradeable;
there is 0 local state to worry about aside from the Kerberos
keytab. It also means that a reboot will immediately restore a
compromised machine FWIW. 

Having said all that, if there is a compelling reason to move away
from the stateless idea I'm all ears.

Further comments in-line below...>
> 1. Image Creation:
>
> Right now the Ovirt Host is created using scripts in the Ovirt host 
> repository that utilize livecd creator to make ISO, USB and PXE formats.
>
> I think it makes sense to start with the tools that Chris L. has already 
> been working with and migrate new functionality in, rather than starting 
> with other tools.
>
> a. Integrate whitelist functionality into the image creation
>    process to reduce image size.
> b. Determine the minimal set of files that the host image needs to
>    construct a baseline whitelist.
> c. Determine minimal set of kernel modules to support wide platform base
>    (i.e. keep disk, net modules.  remove things like sound, isdn, etc...)
> d. Construct a repeatable build process so that images can be built
>    daily for testing.
> e. Construct a utility that detects the minimal set of kernel modules
>    for a given platform and generates a config file that can be fed to
>    the build system for generating hardware platform specific images.
> f. Construct a method for taking the generic host image and removing all
>    kernel modules not applicable for a specific platform.
> g. Construct a method for taking an image and adding site specific
>    configuration information to it (IP addresses of Ovirt Mgmt Server and
>    FreeIPA Server).  This is only necessary to support hosts that require
>    static IP configuration -and- do not have persistent storage to store
>    configuration information.  (If we decide to support static IP config
>    we could make presence of persistent storage a requirement...)
You could add "Kerberos keytab" to "site specific configuration
information" and the host image would be completely
self-contained. I can imagine a library of unique images under
/tftpboot...? >
> 2. Installation
>
> We have to account for two methods of distributing Ovirt: pre-installation 
> by OEMs and install by IT admins.  Pre-install by OEM implies that the 
> platform has flash or a reserved partition on the primary hard disk. 
> Install by IT can be done by setting up PXE boot, adding an external 
> bootable USB disk, booting to CD or installing to onboard flash/disk.
>
> The livecd/liveusb disk should allow either booting and running directly or
> installation onto platform flash/disk.
That's a nice addition, yeah. Although again I'm in favor of the
running image itself remaining stateless.
> 3. Persistent Configuration Storage
>
> Since the host is non-persistent and may or may not have access to 
> persistent storage, this could become tricky.  It's further complicated
by
> the need to support DHCP and static addressing of the host.
>
> The following things need to be stored for the host:
> a. Kerberos keytab or SSL Certificate
> b. IP configuration (if using static configuration)
> c. IP address of FreeIPA server and Ovirt Management Server (if using
>    static configuration)
> Here's where things could be much simpler if DHCP were required. 
Without
> static addressing, b is not necessary and c can be transmitted as custom 
> DHCP fields (iirc).  Then the only thing that needs to be cached on the 
> host on persistent storage is the auth credentials.  (NOTE: Does 
> Microsoft's DHCP server allow custom DHCP fields???)
Actually we've talked about moving away from overloading DHCP for this
stuff and instead handling it with mdns (let zeroconf be
zeroconf). 

There is another interesting possibility we might want to look at in the
medium term. We are actively working on ways to use amqp as part of
our infrastructure. If a machine has an identity via a keytab and an
IP address via DHCP, I imagine it's possible to retrieve the remaining
necessary config information by sending a message to a known alias?
This relieves the oVirt host of having to know the precise address of
the management server. Something to think about at least.
> 4. Initial Configuration
>
> When the host comes up the first time it needs to know the IP address and 
> password of the Ovirt Mgmt server so it can get its auth credentials set 
> up.  Here's how a typical setup might look:
>
> a. Cable up the new hardware and get MAC addrs for each iface.  Need
>    to record which MAC is for:
>    * management network (NOTE: should be only iface that PXE boots
>      if PXE is being used)
>    * storage network
>    * normal networks
>    (these all don't have to be separate, there could be just a single
>    interface on the box used for management, storage and normal traffic)
> b. Set up DNS/DHCP servers.  If using static addressing, skip DHCP setup
>    (This is a manual step that IT admin would do, i.e. we're not trying
>    to automate this process)
> c. Boot Ovirt Host for first time.  Kernel cmdline takes options like:
>    * ovirt_net=static (to indicate that static ip config should be
>      prompted for during boot)
>    * ovirt_auth_pw=<password> (password to use when connecting to the
>      Ovirt Mgmt server to register the host and retrieve auth information)
>    * ovirt_init (this is specified to let the host know that initial
>      config needs to happen.  Otherwise normal boot occurs)
Getting back to the stateless idea, I think the host image should
always boot exactly the same way and be smart enough to know whether
it needs config information or not and, if so, ask for it? Having said
that the "password" idea is reasonable except that you would always
need to be there to enter the password for a reboot...
> f. The host is checked for persistent storage and TPM devices.  If TPM
>    is found it is initialized so that auth info can be stored there.
>    * If either USB or hard disk is found it is checked for a partition
>      with a label called "OVIRT".  If that exists it is used for
>      persistent storage.  If it does not, an "OVIRT" partition is
created.
>      on available unpartitioned space. (how big should it be?)
>    * USB/disk is also checked for swap partition.  If none is found, a
>      swap partition is created using unpartitioned space.
Hadn't thought about using TPM for the keytab, that's a neat idea.
I'm
a bit leery of using any local HDD on the machine for it though.
> 7. Performance Monitoring/Auditing/Health
>
> * collectd is used presently to send performance stats to management
>   server.  Is this the right solution?  I don't have a better one, but
>   suggestions are appreciated.
There is some really interesting stuff going on around monitoring with
amqp. At a first step we need to try to get collectd hooked into
that. 
> * Need to create a health monitoring daemon that communicates to mgmt
>   server:
>   - Sends heartbeat at configurable interval
>   - Sends status changes of host (including machine check exceptions,
>     and other errors)
>   - Sends VM state changes
This is where the amqp stuff might be particularly useful, although we
need something in place before July (which is the earliest we can
really start looking at AMQP, it sounds like).
> 8. Clustering
>
> This probably needs a lot of refinement.  So someone who knows clustering 
> better than I do, should feel free to help out here...
I'm afraid that someone would not be me <g>...

Take care,
--Hugh

Ovirt devel - Mar 2008 - Ovirt Host Tasks

[Ovirt-devel] Ovirt Host Tasks

[Ovirt-devel] Ovirt Host Tasks

[Ovirt-devel] Ovirt Host Tasks

[Ovirt-devel] Ovirt Host Tasks