Howdy Folks :) So in my working with taskomatic, I noticed that the save/resume functionality seems a bit broken. The first thing I noticed was that libvirt itself has no concept of a separate state for 'saved', which ovirt attempts to maintain. This could work fine but when a node is disconnected and then reconnected, the state will shift to 'unavailable' and then to 'stopped' when it reconnects. There may be other corner cases where this does not work as well. The other issue is that it saves the image to /tmp on the local node. This is usually (always?) a memory based FS and so it will be lost on reboot of the node. It also does not allow for resume on a different host, and presently is not removed after resume so the FS will eventually fill up with a few save/resume cycles. To fix all this I'm thinking we should save our images on a designated storage device, using a new field in the vm database to denote that it is saved, perhaps with the path to the saved image. Once restored or started the image should be deleted. Ian
Ian Main wrote:> > Howdy Folks :) > > So in my working with taskomatic, I noticed that the save/resume functionality seems a bit broken. The first thing I noticed was that libvirt itself has no concept of a separate state for 'saved', which ovirt attempts to maintain. This could work fine but when a node is disconnected and then reconnected, the state will shift to 'unavailable' and then to 'stopped' when it reconnects. There may be other corner cases where this does not work as well. > > The other issue is that it saves the image to /tmp on the local node. This is usually (always?) a memory based FS and so it will be lost on reboot of the node. It also does not allow for resume on a different host, and presently is not removed after resume so the FS will eventually fill up with a few save/resume cycles. > > To fix all this I'm thinking we should save our images on a designated storage device, using a new field in the vm database to denote that it is saved, perhaps with the path to the saved image. Once restored or started the image should be deleted.One of the things we need to think about re: the above is how this interacts with libvirt functionality for snapshotting storage. The two mechanisms should be independent, but will need to be used together at times. libvirt snapshotting has not even been designed yet, so perhaps this discussion needs to happen on libvir-list? Perry
Ian Main wrote:> > Howdy Folks :) > > So in my working with taskomatic, I noticed that the save/resume > functionality seems a bit broken. The first thing I noticed was that libvirt > itself has no concept of a separate state for 'saved', which ovirt attempts > to maintain. This could work fine but when a node is disconnected and then > reconnected, the state will shift to 'unavailable' and then to 'stopped' when > it reconnects. There may be other corner cases where this does not work as > well. > > The other issue is that it saves the image to /tmp on the local node. This > is usually (always?) a memory based FS and so it will be lost on reboot of > the node. It also does not allow for resume on a different host, and > presently is not removed after resume so the FS will eventually fill up with > a few save/resume cycles. > > To fix all this I'm thinking we should save our images on a designated > storage device, using a new field in the vm database to denote that it is > saved, perhaps with the path to the saved image. Once restored or started > the image should be deleted.Yes, sorry, Jim and I had a discussion about this on IRC the other day, I should have sent mail. You are basically exactly right; in it's current incarnation save/restore is 100% completely broken. Basically, we need what you say; the user needs to configure a "save pool" (probably per hardware pool), and then on a save we write to that pool, and store something in the database with a pointer to the filename. On a restore, we read that filename back, restore the image, and delete the restore file. I'm sure there are state problems like you point out as well. -- Chris Lalancette
On Thu, 11 Dec 2008 08:21:24 -0800 Ian Main <imain at redhat.com> wrote:> > > Howdy Folks :) > > So in my working with taskomatic, I noticed that the save/resume functionality seems a bit broken. The first thing I noticed was that libvirt itself has no concept of a separate state for 'saved', which ovirt attempts to maintain. This could work fine but when a node is disconnected and then reconnected, the state will shift to 'unavailable' and then to 'stopped' when it reconnects. There may be other corner cases where this does not work as well. > > The other issue is that it saves the image to /tmp on the local node. This is usually (always?) a memory based FS and so it will be lost on reboot of the node. It also does not allow for resume on a different host, and presently is not removed after resume so the FS will eventually fill up with a few save/resume cycles. > > To fix all this I'm thinking we should save our images on a designated storage device, using a new field in the vm database to denote that it is saved, perhaps with the path to the saved image. Once restored or started the image should be deleted.I should also mention that presently, because we use db-omatic to write state back to the database, save/restore will be broken in the UI as well. When you click 'save' it will save the VM and then the state will go to 'stopped' because db-omatic will see the change in libvirt but libvirt doesn't differentiate stopped from saved. If we add the database field to point to the saved image this would let us know that it was stopped and could be restored from image or started again. Ian