Ian, et.al, I've been doing some thinking about both what taskomatic needs to do in its next incarnation, along with ways of how to do it. WHAT: 1) Taskomatic needs to be able to run on multiple machines at the same time, accessing a central database 2) Taskomatic needs to be able to fire off tasks relating to different VMs (or storage pools) concurrently (whether it's just run on one machine or many). HOW: 1) I think we should actually have two modes for taskomatic: standalone (i.e. I am the only taskomatic), and multi-host (there are other taskomatics). The reason for this is in the standalone case, we probably want to fork one taskomatic process for each VM (or storage pool) we want to perform actions on. In the multi-host case, we don't know how many other taskomatics might be out there doing tasks, so we keep one process per machine (this should be a command-line option/config file option) 2) We need to lock rows in the database as each taskomatic wakes up and finds work to do. Luckily both postgres and activerecord support row locking, so the underlying infrastructure is there. In the standalone case, taskomatic should wake up, look at how many different VMs (or storage pools) there are currently tasks queued for, and fork off that many workers to do work (i.e. if you have start_vm 1, start_vm 2, stop_vm 1 in the queue, you would fork off two workers). Each worker would lock all of the rows of the database corresponding with their VM (i.e. the first worker would lock all rows having to do with VM 1), and then busy themselves with executing the actions for that VM serially. I guess the locking isn't strictly necessary here, since we can tell each worker which VM or storage ID it should work on, but it makes it more like the multihost case. In the multihost case, things are a bit simpler; the taskomatic running on each individual machine would just wake up, find the first task that is not in progress and not locked, and lock all task rows having to do with that VM. Then it would execute these tasks and go back looking for more tasks. Note that in both standalone and multihost case, it's OK for multiple taskomatics to be sending commands to identical managed nodes. Libvirtd itself is serial, so commands might get intertwined, but that's OK since we are explicitly making sure our taskomatics work on different VMs or storage pools. 3) Transaction support in taskomatic (hi slinaberry!). I'm not sure about this one; we are modifying state external to the database, so I'm not sure "rolling-back" a transaction means a whole hill of beans to us. In fact, I might argue that rolling back is worse in this case; if you modified external state, and then crashed, when you come back you might "roll-back" your VM state to something that's totally invalid, and you'll need to be corrected by host-status anyway. Does anyone have further thoughts here? THOUGHTS: Interestingly, I think we can evolve the current taskomatic to do this, rather than re-writing the thing from scratch. Since we cleaned up error reporting handling and reporting, I actually feel a lot better about the state of taskomatic. It really just needs corner/error cases better handled, and then introducing some of the above concepts one at a time. Is there anything in taskomatic right now that people are particularly unhappy about that might warrant a re-write? Chris Lalancette
On Mon, Jun 23, 2008 at 4:06 AM, Chris Lalancette <clalance at redhat.com> wrote:> > 3) Transaction support in taskomatic (hi slinaberry!). I'm not sure about this > one; we are modifying state external to the database, so I'm not sure > "rolling-back" a transaction means a whole hill of beans to us. In fact, I > might argue that rolling back is worse in this case; if you modified external > state, and then crashed, when you come back you might "roll-back" your VM state > to something that's totally invalid, and you'll need to be corrected by > host-status anyway. Does anyone have further thoughts here?Only that this is the tough nut I was unable to crack. And...> > THOUGHTS: > Interestingly, I think we can evolve the current taskomatic to do this, rather > than re-writing the thing from scratch. Since we cleaned up error reporting > handling and reporting, I actually feel a lot better about the state of > taskomatic. It really just needs corner/error cases better handled, and then > introducing some of the above concepts one at a time. Is there anything in > taskomatic right now that people are particularly unhappy about that might > warrant a re-write?...that the 2PC necessity (or not?) in taskomatic was the only issue that warranted a rewrite in my mind. I've been away from taskomatic for a while now; I will take a look at it again. I am glad to read that the exception handing/reporting has been worked on; that was another thing that was an issue. Steve
Throwing in my thoughts/concerns (though I don't know taskomatic well, so shoot it down if they are invalid) Chris Lalancette wrote:> <snip> > > 3) Transaction support in taskomatic (hi slinaberry!). I'm not sure about this > one; we are modifying state external to the database, so I'm not sure > "rolling-back" a transaction means a whole hill of beans to us. In fact, I > might argue that rolling back is worse in this case; if you modified external > state, and then crashed, when you come back you might "roll-back" your VM state > to something that's totally invalid, and you'll need to be corrected by > host-status anyway. Does anyone have further thoughts here? > >I actually thought I had more problems with this when I first read it, but mostly this all seems very reasonable to me on second pass. I think if the external state has been modified, but failed or did not complete as expected, instead of rolling back, it would make more sense to _change_ whatever fields are associated with the changed state, so when the user sees the wui, they do not think nothing has happened, but can instead see a message showing what the real state is. Maybe as an example, this could be something like 'VM restart failed, trying again', or some such.> THOUGHTS: > Interestingly, I think we can evolve the current taskomatic to do this, rather > than re-writing the thing from scratch. Since we cleaned up error reporting > handling and reporting, I actually feel a lot better about the state of > taskomatic. It really just needs corner/error cases better handled, and then > introducing some of the above concepts one at a time. Is there anything in > taskomatic right now that people are particularly unhappy about that might > warrant a re-write? >I mentioned this in irc, but for those who didn't see it - what if instead of reading/writing directly to the database, taskomatic _and_ the wui communicate using an amqp queue? Let me attempt an example of how this might work, and how it could be useful. I am creating a new vm in the wui. I type in new information to start the process, and the info is saved to the database. Now I need to wait for taskomatic to do x,y and z tasks before I can go on to do whatever else I want to do (let's say I want to reboot the vm for some post-install config). If we use queues/messaging, I as the user could immediately say I want to reboot. This request is added to the list after x,y and z (this list could even be shown in the wui, if appropriate). Taskomatic listens for new events to be published on whatever channel it is subscribed to, and starts do the work as it receives notice. As it completes a task, it fires off a message and any database changes are saved (I guess this could either still be by taskomatic, or something else that listens for completion messages and takes care of these updates). It then grabs the next task, and so on. Meanwhile, the wui gets notified as each task is completed, so the user can see where they are in the queue while continuing to do whatever else they might need to do. A side benefit of this approach is that we may not have a locking issue at all, since the apps work strictly from a queue instead of directly again the db. A last thought is that this might also obviate the need for two different modes (multi-machine or single), though I am not positive on that point. -j> Chris Lalancette > > _______________________________________________ > Ovirt-devel mailing list > Ovirt-devel at redhat.com > https://www.redhat.com/mailman/listinfo/ovirt-devel >
On Mon, Jun 23, 2008 at 11:06:08AM +0200, Chris Lalancette wrote:> Ian, et.al, > I've been doing some thinking about both what taskomatic needs to do in its > next incarnation, along with ways of how to do it. > > WHAT: > 1) Taskomatic needs to be able to run on multiple machines at the same time, > accessing a central databaseThis is an over-specialization - basically taskomatic needs to be parallelized whether it runs on one or many machines.> 2) Taskomatic needs to be able to fire off tasks relating to different VMs (or > storage pools) concurrently (whether it's just run on one machine or many).It strikes me that this is avoiding the more general problem - that there are explicit dependancies between tasks. Serializing tasks per-VM is not expressing this concept of dependancies directly. So as an example a task starting a VM, may have a dependancy on a task to start a storage pool (or refresh the volume list in an existing pool). Now while these 2 tasks are pending, another VM start task is schedule which has a dependancy on the same storage task. Or the admin may have some runtime policy to the effect that during the hours 9-5 they want VM 'x' to be running on a machine, and then at 5pm shutdown 'x' and startup 'y' in its place. This has a strict ordering requirement between the 2 vms - they can't be schedule independantly because there won't be RAM for 'y', until 'x' is shutdown.> HOW: > 1) I think we should actually have two modes for taskomatic: standalone (i.e. I > am the only taskomatic), and multi-host (there are other taskomatics). The > reason for this is in the standalone case, we probably want to fork one > taskomatic process for each VM (or storage pool) we want to perform actions on. > In the multi-host case, we don't know how many other taskomatics might be out > there doing tasks, so we keep one process per machine (this should be a > command-line option/config file option)Having two modes is inserting an artificial distinction that really doesn't exist. Even if there is only a single instance of taskomatic runnig on a single machine in the data center there is going to be parallization because the world has gone heavily SMP whether multi-socket or multi-core or both. By the very nature of its work taskomatic is not going to be bottlnecked on CPU, instead spending alot of time waiting on results from operations. To maximise utilizattion of a single node taskomatic will want to be heavily parallized, whether fork() based or thread based, on some multiple of the number of the number of CPUs. On a 4 logical CPU machine perhaps want 16 taskomatic threads running. So whether those 16 threads are on single 4 cpu machine, or a pair of 2 cpu machines is not a dinstiction we ned to consider. We just scale horizontally to add capacity as required.> 2) We need to lock rows in the database as each taskomatic wakes up and finds > work to do. Luckily both postgres and activerecord support row locking, so the > underlying infrastructure is there.We only need row locking if you're working on the model where you keep the transaction open for the duration of taskomatic's processing for that particular job and commit/rollback on completion. It may be that you simply immediately mark a task as 'in progress' and commit that change right at the start. Then later have a second transaction where you fill in the result of the task whether succes or failure.> > In the standalone case, taskomatic should wake up, look at how many different > VMs (or storage pools) there are currently tasks queued for, and fork off that > many workers to do work (i.e. if you have start_vm 1, start_vm 2, stop_vm 1 in > the queue, you would fork off two workers). Each worker would lock all of the > rows of the database corresponding with their VM (i.e. the first worker would > lock all rows having to do with VM 1), and then busy themselves with executing > the actions for that VM serially. I guess the locking isn't strictly necessary > here, since we can tell each worker which VM or storage ID it should work on, > but it makes it more like the multihost case.Forking a thread per VM doesn't work because there can be ordering requirements between taskss on different VMs, and/or storage. Explicit task dependancies need to be tracked. At which point, each taskomatic process/thread in existance simply waits for a task to arrive which has no pending dependant tasks, claims it and goes to work on it. Completing the task will then satisfy dependant tasks allowing them to be processeed and so on. Need to specialize a particular worker process to a particular object.> Note that in both standalone and multihost case, it's OK for multiple > taskomatics to be sending commands to identical managed nodes. Libvirtd itself > is serial, so commands might get intertwined, but that's OK since we are > explicitly making sure our taskomatics work on different VMs or storage pools.Don't rely on libvirtd being serial - we may well find ourselves making it fully parallized allowing operations to be made & executed concurrently. At the very least we'll have current execution when adding async background jobs.> 3) Transaction support in taskomatic (hi slinaberry!). I'm not sure about this > one; we are modifying state external to the database, so I'm not sure > "rolling-back" a transaction means a whole hill of beans to us. In fact, I > might argue that rolling back is worse in this case; if you modified external > state, and then crashed, when you come back you might "roll-back" your VM state > to something that's totally invalid, and you'll need to be corrected by > host-status anyway. Does anyone have further thoughts here?I agree that life probably isn't going to be as simple as just rolling back. I think its much more likely we'll need to explicitly track the failure against the task. So more similar to the example I mentioned earlier, where we mark the task as in progreess in the DB, and then later update with the outcome of the task. If a task failed you'd then want to fail and tasks depending on it and tasks depending on those, etc, etc. This gives oVirt ability to track the failures and automatically re-schedule new tasks to try again, or let the admin choose a different action. Simply rolling-back the tranaction means you're not capturing any of this and just re-trying over and over without neccessarily solving the problem Regards, Daniel. -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
On Mon, 2008-06-23 at 11:06 +0200, Chris Lalancette wrote:> 3) Transaction support in taskomatic (hi slinaberry!). I'm not sure about this > one; we are modifying state external to the database, so I'm not sure > "rolling-back" a transaction means a whole hill of beans to us. In fact, I > might argue that rolling back is worse in this case; if you modified external > state, and then crashed, when you come back you might "roll-back" your VM state > to something that's totally invalid, and you'll need to be corrected by > host-status anyway. Does anyone have further thoughts here?I wouldn't even worry about rolling back transactions - it seems like a lot of pain for very dubious gain. A slightly related question is: how do you deal with user actions that require multiple tasks to be performed ? You'd need to keep track of some higherlevel object that groups related tasks together, and when one of them fails give the user an option to retry (e.g. when creating a StoragePool failed, user adds manually more storage and then restarts that whole task group, which will only redo the tasks that failed or blocked on failure) David
So I've been doing some thinking on this, and here's what I've come up with to date. As usual any input is appreciated. I wanted to make sure we had some basic requirements we could discuss. Taskomatic should: - Execute many tasks in a timely manner (presumably via distributed/multi threaded setup) - Execute tasks in the correct order. - Implement transaction support (if we can/want it) - Have good error reporting. - Include authentication and encryption. - It would be nice to be able to see the state of the queues in the WUI. - Implement at least the following tasks: - start/stop/create/destroy VMs - clear host functionality. Destroy all VMs/pools. - Migrate vms. - Storage pool creation/destruction/management Now, if we break it down the system into basic components, I'm thinking: - Task producer: This is the WUI now but could also be some other script in the future. Creates task and adds them to the queue. - Task ordering system: At some point I think a single process needs to separate out the tasks and order them. I think it would be useful to actually move them into separate independent queues that can be executed in parallel. This would be done such that each action in a given queue would need to be done in order, but each queue will not have dependencies on anything in another queue and so they can be worked on in parallel. While this could be a bottleneck I think that the logic required to order the tasks will not be onerous and should keep up with high loads. - Task implementation system: This would take a given queue and implement the tasks within it, dispatching requests to hosts as needed. Any errors occurring in this system will be reported and possibly we could implement rollback. - Host/vm/pool State: In addition to the above, in order to implement queue ordering and determine for certain that a given task succeeded, we'll require host/vm/storage pool state information that is as up to date as possible. So in terms of implementing this, a lot of it comes down to technology selection. Queues could continue to be implemented in postgresql. It would be nice however to have something that was event driven and did not require polling. It is possible we could use a python stored procedure to alert the consumer but that is postgresql specific and may have its own problems. We may also consider using qpid for this, as it can do 'durable' queues which are stored on disk and survive across reboots. Somewhere however, it needs to have a complete view of all queues so it can keep track of things. I think a single ruby process could be used to order the tasks and place them in per-thread/process queues. If using a DB I think we could either migrate the entries to a new 'in-progress' table, or update the row with the ID of the process/thread and possibly the sequence number to be used in implementing the queue. Again however, this is another polling point and another point where we could shift to qpid msg queues. Actually we may be best off to get commands from the wui via qpid and then have the separate queues in the database as it would allow the WUI to easily display the status of the work qeueus. Jobs could be marked as 'completed' rather than removed until the queue is complete. Individual processes could be awoken using one of various methods. One possibility is that the task implementers be a mix of ruby processes running on the wui and C or C++ applications running on the node that use qpid modeling to represent the host, VMs, and storage pools. The managed node daemon would model the host/vm/storage pools as objects and the wui-side process could call higher level functions to get things done. eg vm->start() kind of thing where the daemon would then implement this using the libvirt C api. State information for the host, VMs and storage pools would also be made available to the wui side and could be subscribed to by the implementers and the task ordering system. Alternatively at this point, we could stick with libvirt. The only problems I see with this is the libvirt backport issue, the potentially long timeout issue (looks like amqp can specify timeouts, or one could go async..). It would be nice to have up to date status information available from the managed nodes as well. It would also give us the opportunity to clean up a good deal of the daemons we have currently implementing various things. Anyway, that's what I'm thinking for now. I left out a lot of details for the sake of brevity. Thoughts? Ian