Ian, et.al,
I've been doing some thinking about both what taskomatic needs to do in
its
next incarnation, along with ways of how to do it.
WHAT:
1) Taskomatic needs to be able to run on multiple machines at the same time,
accessing a central database
2) Taskomatic needs to be able to fire off tasks relating to different VMs (or
storage pools) concurrently (whether it's just run on one machine or many).
HOW:
1) I think we should actually have two modes for taskomatic: standalone (i.e. I
am the only taskomatic), and multi-host (there are other taskomatics). The
reason for this is in the standalone case, we probably want to fork one
taskomatic process for each VM (or storage pool) we want to perform actions on.
In the multi-host case, we don't know how many other taskomatics might be
out
there doing tasks, so we keep one process per machine (this should be a
command-line option/config file option)
2) We need to lock rows in the database as each taskomatic wakes up and finds
work to do. Luckily both postgres and activerecord support row locking, so the
underlying infrastructure is there.
In the standalone case, taskomatic should wake up, look at how many different
VMs (or storage pools) there are currently tasks queued for, and fork off that
many workers to do work (i.e. if you have start_vm 1, start_vm 2, stop_vm 1 in
the queue, you would fork off two workers). Each worker would lock all of the
rows of the database corresponding with their VM (i.e. the first worker would
lock all rows having to do with VM 1), and then busy themselves with executing
the actions for that VM serially. I guess the locking isn't strictly
necessary
here, since we can tell each worker which VM or storage ID it should work on,
but it makes it more like the multihost case.
In the multihost case, things are a bit simpler; the taskomatic running on each
individual machine would just wake up, find the first task that is not in
progress and not locked, and lock all task rows having to do with that VM. Then
it would execute these tasks and go back looking for more tasks.
Note that in both standalone and multihost case, it's OK for multiple
taskomatics to be sending commands to identical managed nodes. Libvirtd itself
is serial, so commands might get intertwined, but that's OK since we are
explicitly making sure our taskomatics work on different VMs or storage pools.
3) Transaction support in taskomatic (hi slinaberry!). I'm not sure about
this
one; we are modifying state external to the database, so I'm not sure
"rolling-back" a transaction means a whole hill of beans to us. In
fact, I
might argue that rolling back is worse in this case; if you modified external
state, and then crashed, when you come back you might "roll-back" your
VM state
to something that's totally invalid, and you'll need to be corrected by
host-status anyway. Does anyone have further thoughts here?
THOUGHTS:
Interestingly, I think we can evolve the current taskomatic to do this, rather
than re-writing the thing from scratch. Since we cleaned up error reporting
handling and reporting, I actually feel a lot better about the state of
taskomatic. It really just needs corner/error cases better handled, and then
introducing some of the above concepts one at a time. Is there anything in
taskomatic right now that people are particularly unhappy about that might
warrant a re-write?
Chris Lalancette
On Mon, Jun 23, 2008 at 4:06 AM, Chris Lalancette <clalance at redhat.com> wrote:> > 3) Transaction support in taskomatic (hi slinaberry!). I'm not sure about this > one; we are modifying state external to the database, so I'm not sure > "rolling-back" a transaction means a whole hill of beans to us. In fact, I > might argue that rolling back is worse in this case; if you modified external > state, and then crashed, when you come back you might "roll-back" your VM state > to something that's totally invalid, and you'll need to be corrected by > host-status anyway. Does anyone have further thoughts here?Only that this is the tough nut I was unable to crack. And...> > THOUGHTS: > Interestingly, I think we can evolve the current taskomatic to do this, rather > than re-writing the thing from scratch. Since we cleaned up error reporting > handling and reporting, I actually feel a lot better about the state of > taskomatic. It really just needs corner/error cases better handled, and then > introducing some of the above concepts one at a time. Is there anything in > taskomatic right now that people are particularly unhappy about that might > warrant a re-write?...that the 2PC necessity (or not?) in taskomatic was the only issue that warranted a rewrite in my mind. I've been away from taskomatic for a while now; I will take a look at it again. I am glad to read that the exception handing/reporting has been worked on; that was another thing that was an issue. Steve
Throwing in my thoughts/concerns (though I don't know taskomatic well, so shoot it down if they are invalid) Chris Lalancette wrote:> <snip> > > 3) Transaction support in taskomatic (hi slinaberry!). I'm not sure about this > one; we are modifying state external to the database, so I'm not sure > "rolling-back" a transaction means a whole hill of beans to us. In fact, I > might argue that rolling back is worse in this case; if you modified external > state, and then crashed, when you come back you might "roll-back" your VM state > to something that's totally invalid, and you'll need to be corrected by > host-status anyway. Does anyone have further thoughts here? > >I actually thought I had more problems with this when I first read it, but mostly this all seems very reasonable to me on second pass. I think if the external state has been modified, but failed or did not complete as expected, instead of rolling back, it would make more sense to _change_ whatever fields are associated with the changed state, so when the user sees the wui, they do not think nothing has happened, but can instead see a message showing what the real state is. Maybe as an example, this could be something like 'VM restart failed, trying again', or some such.> THOUGHTS: > Interestingly, I think we can evolve the current taskomatic to do this, rather > than re-writing the thing from scratch. Since we cleaned up error reporting > handling and reporting, I actually feel a lot better about the state of > taskomatic. It really just needs corner/error cases better handled, and then > introducing some of the above concepts one at a time. Is there anything in > taskomatic right now that people are particularly unhappy about that might > warrant a re-write? >I mentioned this in irc, but for those who didn't see it - what if instead of reading/writing directly to the database, taskomatic _and_ the wui communicate using an amqp queue? Let me attempt an example of how this might work, and how it could be useful. I am creating a new vm in the wui. I type in new information to start the process, and the info is saved to the database. Now I need to wait for taskomatic to do x,y and z tasks before I can go on to do whatever else I want to do (let's say I want to reboot the vm for some post-install config). If we use queues/messaging, I as the user could immediately say I want to reboot. This request is added to the list after x,y and z (this list could even be shown in the wui, if appropriate). Taskomatic listens for new events to be published on whatever channel it is subscribed to, and starts do the work as it receives notice. As it completes a task, it fires off a message and any database changes are saved (I guess this could either still be by taskomatic, or something else that listens for completion messages and takes care of these updates). It then grabs the next task, and so on. Meanwhile, the wui gets notified as each task is completed, so the user can see where they are in the queue while continuing to do whatever else they might need to do. A side benefit of this approach is that we may not have a locking issue at all, since the apps work strictly from a queue instead of directly again the db. A last thought is that this might also obviate the need for two different modes (multi-machine or single), though I am not positive on that point. -j> Chris Lalancette > > _______________________________________________ > Ovirt-devel mailing list > Ovirt-devel at redhat.com > https://www.redhat.com/mailman/listinfo/ovirt-devel >
On Mon, Jun 23, 2008 at 11:06:08AM +0200, Chris Lalancette wrote:> Ian, et.al, > I've been doing some thinking about both what taskomatic needs to do in its > next incarnation, along with ways of how to do it. > > WHAT: > 1) Taskomatic needs to be able to run on multiple machines at the same time, > accessing a central databaseThis is an over-specialization - basically taskomatic needs to be parallelized whether it runs on one or many machines.> 2) Taskomatic needs to be able to fire off tasks relating to different VMs (or > storage pools) concurrently (whether it's just run on one machine or many).It strikes me that this is avoiding the more general problem - that there are explicit dependancies between tasks. Serializing tasks per-VM is not expressing this concept of dependancies directly. So as an example a task starting a VM, may have a dependancy on a task to start a storage pool (or refresh the volume list in an existing pool). Now while these 2 tasks are pending, another VM start task is schedule which has a dependancy on the same storage task. Or the admin may have some runtime policy to the effect that during the hours 9-5 they want VM 'x' to be running on a machine, and then at 5pm shutdown 'x' and startup 'y' in its place. This has a strict ordering requirement between the 2 vms - they can't be schedule independantly because there won't be RAM for 'y', until 'x' is shutdown.> HOW: > 1) I think we should actually have two modes for taskomatic: standalone (i.e. I > am the only taskomatic), and multi-host (there are other taskomatics). The > reason for this is in the standalone case, we probably want to fork one > taskomatic process for each VM (or storage pool) we want to perform actions on. > In the multi-host case, we don't know how many other taskomatics might be out > there doing tasks, so we keep one process per machine (this should be a > command-line option/config file option)Having two modes is inserting an artificial distinction that really doesn't exist. Even if there is only a single instance of taskomatic runnig on a single machine in the data center there is going to be parallization because the world has gone heavily SMP whether multi-socket or multi-core or both. By the very nature of its work taskomatic is not going to be bottlnecked on CPU, instead spending alot of time waiting on results from operations. To maximise utilizattion of a single node taskomatic will want to be heavily parallized, whether fork() based or thread based, on some multiple of the number of the number of CPUs. On a 4 logical CPU machine perhaps want 16 taskomatic threads running. So whether those 16 threads are on single 4 cpu machine, or a pair of 2 cpu machines is not a dinstiction we ned to consider. We just scale horizontally to add capacity as required.> 2) We need to lock rows in the database as each taskomatic wakes up and finds > work to do. Luckily both postgres and activerecord support row locking, so the > underlying infrastructure is there.We only need row locking if you're working on the model where you keep the transaction open for the duration of taskomatic's processing for that particular job and commit/rollback on completion. It may be that you simply immediately mark a task as 'in progress' and commit that change right at the start. Then later have a second transaction where you fill in the result of the task whether succes or failure.> > In the standalone case, taskomatic should wake up, look at how many different > VMs (or storage pools) there are currently tasks queued for, and fork off that > many workers to do work (i.e. if you have start_vm 1, start_vm 2, stop_vm 1 in > the queue, you would fork off two workers). Each worker would lock all of the > rows of the database corresponding with their VM (i.e. the first worker would > lock all rows having to do with VM 1), and then busy themselves with executing > the actions for that VM serially. I guess the locking isn't strictly necessary > here, since we can tell each worker which VM or storage ID it should work on, > but it makes it more like the multihost case.Forking a thread per VM doesn't work because there can be ordering requirements between taskss on different VMs, and/or storage. Explicit task dependancies need to be tracked. At which point, each taskomatic process/thread in existance simply waits for a task to arrive which has no pending dependant tasks, claims it and goes to work on it. Completing the task will then satisfy dependant tasks allowing them to be processeed and so on. Need to specialize a particular worker process to a particular object.> Note that in both standalone and multihost case, it's OK for multiple > taskomatics to be sending commands to identical managed nodes. Libvirtd itself > is serial, so commands might get intertwined, but that's OK since we are > explicitly making sure our taskomatics work on different VMs or storage pools.Don't rely on libvirtd being serial - we may well find ourselves making it fully parallized allowing operations to be made & executed concurrently. At the very least we'll have current execution when adding async background jobs.> 3) Transaction support in taskomatic (hi slinaberry!). I'm not sure about this > one; we are modifying state external to the database, so I'm not sure > "rolling-back" a transaction means a whole hill of beans to us. In fact, I > might argue that rolling back is worse in this case; if you modified external > state, and then crashed, when you come back you might "roll-back" your VM state > to something that's totally invalid, and you'll need to be corrected by > host-status anyway. Does anyone have further thoughts here?I agree that life probably isn't going to be as simple as just rolling back. I think its much more likely we'll need to explicitly track the failure against the task. So more similar to the example I mentioned earlier, where we mark the task as in progreess in the DB, and then later update with the outcome of the task. If a task failed you'd then want to fail and tasks depending on it and tasks depending on those, etc, etc. This gives oVirt ability to track the failures and automatically re-schedule new tasks to try again, or let the admin choose a different action. Simply rolling-back the tranaction means you're not capturing any of this and just re-trying over and over without neccessarily solving the problem Regards, Daniel. -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
On Mon, 2008-06-23 at 11:06 +0200, Chris Lalancette wrote:> 3) Transaction support in taskomatic (hi slinaberry!). I'm not sure about this > one; we are modifying state external to the database, so I'm not sure > "rolling-back" a transaction means a whole hill of beans to us. In fact, I > might argue that rolling back is worse in this case; if you modified external > state, and then crashed, when you come back you might "roll-back" your VM state > to something that's totally invalid, and you'll need to be corrected by > host-status anyway. Does anyone have further thoughts here?I wouldn't even worry about rolling back transactions - it seems like a lot of pain for very dubious gain. A slightly related question is: how do you deal with user actions that require multiple tasks to be performed ? You'd need to keep track of some higherlevel object that groups related tasks together, and when one of them fails give the user an option to retry (e.g. when creating a StoragePool failed, user adds manually more storage and then restarts that whole task group, which will only redo the tasks that failed or blocked on failure) David
So I've been doing some thinking on this, and here's what I've come
up with to date. As usual any input is appreciated.
I wanted to make sure we had some basic requirements we could discuss.
Taskomatic should:
- Execute many tasks in a timely manner (presumably via distributed/multi
threaded setup)
- Execute tasks in the correct order.
- Implement transaction support (if we can/want it)
- Have good error reporting.
- Include authentication and encryption.
- It would be nice to be able to see the state of the queues in the WUI.
- Implement at least the following tasks:
- start/stop/create/destroy VMs
- clear host functionality. Destroy all VMs/pools.
- Migrate vms.
- Storage pool creation/destruction/management
Now, if we break it down the system into basic components, I'm thinking:
- Task producer: This is the WUI now but could also be some other script in the
future. Creates task and adds them to the queue.
- Task ordering system: At some point I think a single process needs to separate
out the tasks and order them. I think it would be useful to actually move them
into separate independent queues that can be executed in parallel. This would
be done such that each action in a given queue would need to be done in order,
but each queue will not have dependencies on anything in another queue and so
they can be worked on in parallel. While this could be a bottleneck I think
that the logic required to order the tasks will not be onerous and should keep
up with high loads.
- Task implementation system: This would take a given queue and implement the
tasks within it, dispatching requests to hosts as needed. Any errors occurring
in this system will be reported and possibly we could implement rollback.
- Host/vm/pool State: In addition to the above, in order to implement queue
ordering and determine for certain that a given task succeeded, we'll
require host/vm/storage pool state information that is as up to date as
possible.
So in terms of implementing this, a lot of it comes down to technology
selection.
Queues could continue to be implemented in postgresql. It would be nice however
to have something that was event driven and did not require polling. It is
possible we could use a python stored procedure to alert the consumer but that
is postgresql specific and may have its own problems. We may also consider
using qpid for this, as it can do 'durable' queues which are stored on
disk and survive across reboots. Somewhere however, it needs to have a complete
view of all queues so it can keep track of things.
I think a single ruby process could be used to order the tasks and place them in
per-thread/process queues. If using a DB I think we could either migrate the
entries to a new 'in-progress' table, or update the row with the ID of
the process/thread and possibly the sequence number to be used in implementing
the queue. Again however, this is another polling point and another point where
we could shift to qpid msg queues. Actually we may be best off to get commands
from the wui via qpid and then have the separate queues in the database as it
would allow the WUI to easily display the status of the work qeueus. Jobs could
be marked as 'completed' rather than removed until the queue is
complete. Individual processes could be awoken using one of various methods.
One possibility is that the task implementers be a mix of ruby processes running
on the wui and C or C++ applications running on the node that use qpid modeling
to represent the host, VMs, and storage pools. The managed node daemon would
model the host/vm/storage pools as objects and the wui-side process could call
higher level functions to get things done. eg vm->start() kind of thing
where the daemon would then implement this using the libvirt C api. State
information for the host, VMs and storage pools would also be made available to
the wui side and could be subscribed to by the implementers and the task
ordering system.
Alternatively at this point, we could stick with libvirt. The only problems I
see with this is the libvirt backport issue, the potentially long timeout issue
(looks like amqp can specify timeouts, or one could go async..). It would be
nice to have up to date status information available from the managed nodes as
well. It would also give us the opportunity to clean up a good deal of the
daemons we have currently implementing various things.
Anyway, that's what I'm thinking for now. I left out a lot of details
for the sake of brevity.
Thoughts?
Ian