Joshua Dotson
2014-Jan-15 22:47 UTC
[libvirt-users] Ceph RBD locking for libvirt-managed LXC (someday) live migrations
Hi, I'm trying to build an active/active virtualization cluster using a Ceph RBD as backing for each libvirt-managed LXC. I know live migration for LXC isn't yet possible, but I'd like to build my infrastructure as if it were. That is, I would like to be sure proper locking is in place for live migrations to someday take place. In other words, I'm building things as if I were using KVM and live migration via libvirt. I've been looking at corosync, pacemaker, virtlock, sanlock, gfs2, ocfs2, glusterfs, cephfs, ceph RBD and other solutions. I admit that I'm quite confused. If oVirt, with its embedded GlusterFS and its planned self-hosted engine option, supported LXC, I'd use that. However the stars have not yet aligned for that. It seems that the most elegant and scalable approach may be to utilize Ceph's RBD with its native locking mechanism plus corosync and pacemaker for fencing, for a number of reasons out of scope for this email. *My question now is in regards to proper locking. Please see the following links. The libvirt hook looks good, but is there any expectation that this arrangement will become a patch to libvirt itself, as is suggested by the second link?* http://www.wogri.at/en/linux/ceph-libvirt-locking/ http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-August/003887.html Can anyone guide me on how to theoretically build a very "lock" safe 5-node active-active KVM cluster atop Ceph RBD? Must I use sanlock with its NFS or GFS2 with its performance bottlenecks? Does your answer work for LXC (sans the current state of live migration)? Thanks, Joshua -- Joshua Dotson Founder, Wrale Ltd
Daniel P. Berrange
2014-Jan-16 10:37 UTC
Re: [libvirt-users] Ceph RBD locking for libvirt-managed LXC (someday) live migrations
On Wed, Jan 15, 2014 at 05:47:35PM -0500, Joshua Dotson wrote:> Hi, > > I'm trying to build an active/active virtualization cluster using a Ceph > RBD as backing for each libvirt-managed LXC. I know live migration for LXC > isn't yet possible, but I'd like to build my infrastructure as if it were. > That is, I would like to be sure proper locking is in place for live > migrations to someday take place. In other words, I'm building things as > if I were using KVM and live migration via libvirt. > > I've been looking at corosync, pacemaker, virtlock, sanlock, gfs2, ocfs2, > glusterfs, cephfs, ceph RBD and other solutions. I admit that I'm quite > confused. If oVirt, with its embedded GlusterFS and its planned > self-hosted engine option, supported LXC, I'd use that. However the stars > have not yet aligned for that. > > It seems that the most elegant and scalable approach may be to utilize > Ceph's RBD with its native locking mechanism plus corosync and pacemaker > for fencing, for a number of reasons out of scope for this email. > > *My question now is in regards to proper locking. Please see the following > links. The libvirt hook looks good, but is there any expectation that this > arrangement will become a patch to libvirt itself, as is suggested by the > second link?* > > http://www.wogri.at/en/linux/ceph-libvirt-locking/ > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-August/003887.html > > Can anyone guide me on how to theoretically build a very "lock" safe 5-node > active-active KVM cluster atop Ceph RBD? Must I use sanlock with its NFS > or GFS2 with its performance bottlenecks? Does your answer work for LXC > (sans the current state of live migration)?The "proper" way would likely be to add a new libvirt lock manager plugin that uses ceph's locking, or perhaps extend virtlockd to be able to acquire locks on ceph. The hook scripts aren't called in the right places to be able todo safe locking in all scenarios where you need it, in particular I don't think it'll cope with migration correctly. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Joshua Dotson
2014-Jan-16 19:43 UTC
Re: [libvirt-users] Ceph RBD locking for libvirt-managed LXC (someday) live migrations
> > Can anyone guide me on how to theoretically build a very "lock" safe > 5-node > > active-active KVM cluster atop Ceph RBD? Must I use sanlock with its NFS > > or GFS2 with its performance bottlenecks? Does your answer work for LXC > > (sans the current state of live migration)? > > The "proper" way would likely be to add a new libvirt lock manager > plugin that uses ceph's locking, or perhaps extend virtlockd to > be able to acquire locks on ceph. > > The hook scripts aren't called in the right places to be able todo > safe locking in all scenarios where you need it, in particular I > don't think it'll cope with migration correctly. > > Daniel >Thanks for your reply. Is it unwise to open an issue to request development of such a plugin? Do you think it makes more sense to file such an issue with one or both of the Ceph and Libvirt teams? I just learned that Ceph RBD's locks never time out. Locks must be removed manually, for example, in the event of a hypervisor-gone-missing event. Is removing the lock upon such an event something that is typically built into a libvirt lock manager plug-in? Is this in any way a show stopper? Do you think it may be best for me to drop direct use of RBD for libvirt usage, in lieu of GFS2 or OCFS2 atop RBDs? I'm starting to feel far less enamored with the concept of live migration. It finally dawned on me today that live migration cannot realistically preempt an unplanned outage of any hypervisor node(s). Self-healing offline migration (plus fencing, I suppose) for LXC/KVM with Ceph RBD and proper locking would be enough to keep me happy for a long while, I think. Besides, this 5-node cluster is the management complex for a much larger Mesos+Aurora cluster. I need to move on to more important things (like running H/A LXC on Mesos Slave via Aurora). :-) Any thoughts or tips are welcomed. Thanks again, Joshua On Thu, Jan 16, 2014 at 5:37 AM, Daniel P. Berrange <berrange@redhat.com>wrote:> On Wed, Jan 15, 2014 at 05:47:35PM -0500, Joshua Dotson wrote: > > Hi, > > > > I'm trying to build an active/active virtualization cluster using a Ceph > > RBD as backing for each libvirt-managed LXC. I know live migration for > LXC > > isn't yet possible, but I'd like to build my infrastructure as if it > were. > > That is, I would like to be sure proper locking is in place for live > > migrations to someday take place. In other words, I'm building things as > > if I were using KVM and live migration via libvirt. > > > > I've been looking at corosync, pacemaker, virtlock, sanlock, gfs2, ocfs2, > > glusterfs, cephfs, ceph RBD and other solutions. I admit that I'm quite > > confused. If oVirt, with its embedded GlusterFS and its planned > > self-hosted engine option, supported LXC, I'd use that. However the > stars > > have not yet aligned for that. > > > > It seems that the most elegant and scalable approach may be to utilize > > Ceph's RBD with its native locking mechanism plus corosync and pacemaker > > for fencing, for a number of reasons out of scope for this email. > > > > *My question now is in regards to proper locking. Please see the > following > > links. The libvirt hook looks good, but is there any expectation that > this > > arrangement will become a patch to libvirt itself, as is suggested by the > > second link?* > > > > http://www.wogri.at/en/linux/ceph-libvirt-locking/ > > > > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-August/003887.html > > > > Can anyone guide me on how to theoretically build a very "lock" safe > 5-node > > active-active KVM cluster atop Ceph RBD? Must I use sanlock with its NFS > > or GFS2 with its performance bottlenecks? Does your answer work for LXC > > (sans the current state of live migration)? > > The "proper" way would likely be to add a new libvirt lock manager > plugin that uses ceph's locking, or perhaps extend virtlockd to > be able to acquire locks on ceph. > > The hook scripts aren't called in the right places to be able todo > safe locking in all scenarios where you need it, in particular I > don't think it'll cope with migration correctly. > > Daniel > -- > |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/:| > |: http://libvirt.org -o- http://virt-manager.org:| > |: http://autobuild.org -o- http://search.cpan.org/~danberr/:| > |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc:| >-- Joshua Dotson Founder, Wrale Ltd