Hi, I've recently been working on setting up a set of libvirt compute nodes that will be using a ceph rbd pool for storing vm disk image files. I've got a couple of issues I've run into. First, per the standard ceph documentation examples [1], the way to add a disk is to create a block in the VM definition XML that looks something like this: <disk type='network' device='disk'> <source protocol='rbd' name='libvirt-pool/new-libvirt-image'> <host name='{monitor-host-1}' port='6789'/> <host name='{monitor-host-2}' port='6789'/> <host name='{monitor-host-3}' port='6789'/> </source> <target dev='vda' bus='virtio'/> <auth username='libvirt'> <secret type='ceph' uuid='9ec59067-fdbc-a6c0-03ff-df165c0587b8'/> </auth> </disk> The trouble with this approach is that those ceph cluster details (secret uuid and monitor host lists), need to be stored separately in every single VM disk definition separately. That makes for a lot of maintenance when those details need to change (eg: replace a monitor host (common), or change the auth details (less common)). I'd prefer to be able to define a libvirt storage pool that contains those details, and then reference the disks within each VM as volumes, so that I only need to change the ceph monitor/auth details once per libvirt compute host, rather than for every single VM disk definition. I've rebuilt my libvirt packages using --with-rbd-support so that I can successfully define a libvirt storage pool as follows: <pool type='rbd'> <name>libvirt-rbd-pool</name> <source> <name>libvirt-pool</name> <host name='{monitor-host-1}' port='6789'/> <host name='{monitor-host-2}' port='6789'/> <host name='{monitor-host-3}' port='6789'/> <auth username='libvirt' type='ceph'> <secret uuid='9ec59067-fdbc-a6c0-03ff-df165c0587b8'/> </auth> </source> </pool> However, when I go to start a VM with a volume created in that pool as follows, I get an error: <disk type='volume' device='disk'> <source pool='libvirt-rbd-pool' volume='{rbd-volume-name}'/> <driver name='qemu' type='raw' cache='writethrough'/> <target dev='vda' bus='virtio'/> </disk> "using 'rbd' pools for backing 'volume' disks isn't yet supported" When I dug through the code, it appears that there's an explicit check for RBD type storage pools (VIR_STORAGE_POOL_RBD) that disables that (libvirt-1.2.13/src/storage/storage_driver.c:3159). Is there a particular reason for that? Has it just not been implemented yet, or am I specifying the disk definition in the wrong way? Second, using the former disk definition method, I'm able to run VMs under qemu, *and* migrate them. Very slick. Nice work all. However, I found that since by default virt-manager leaves the VM defined on both the source and destination, I'm actually able to start the VM in both places. I didn't see an option to disable that, so I just wrote a simple wrapper script to do the right thing via virsh using --undefinesource, but I can't guarantee that some other admin might not skip that and just use the GUI. It appears that libvirt (or is it qemu?) doesn't set rbd locks on the disk image files by default. After running across [2], I had originally thought about writing some hooks to set and release locks on the VMs using the rbd cli, but after reading the docs on the migration process [3], I think that's probably not possible since the VM is started in both places temporarily. I think my other option is to setup some shared fs (maybe cephfs) and point virtlockd at it so that all of the libvirt compute hosts register locks on VMs properly. However, I thought I'd ask if anyone knows if there's some magic other parameter or setting I can use to have libvirt/qemu just use rbd locks natively. Or, is that not implemented either? Thanks for your help. Cheers, Brian [1] <http://ceph.com/docs/master/rbd/libvirt/#configuring-the-vm> [2] <https://www.redhat.com/archives/libvirt-users/2014-January/msg00058.html> [3] <https://libvirt.org/hooks.html#qemu_migration>
On 03/31/2015 11:47 AM, Brian Kroth wrote:> Hi, I've recently been working on setting up a set of libvirt compute > nodes that will be using a ceph rbd pool for storing vm disk image > files. I've got a couple of issues I've run into. > > First, per the standard ceph documentation examples [1], the way to add > a disk is to create a block in the VM definition XML that looks > something like this: > > <disk type='network' device='disk'> > <source protocol='rbd' name='libvirt-pool/new-libvirt-image'> > <host name='{monitor-host-1}' port='6789'/> > <host name='{monitor-host-2}' port='6789'/> > <host name='{monitor-host-3}' port='6789'/> > </source> > <target dev='vda' bus='virtio'/> > <auth username='libvirt'> > <secret type='ceph' uuid='9ec59067-fdbc-a6c0-03ff-df165c0587b8'/> > </auth> > </disk> > > The trouble with this approach is that those ceph cluster details > (secret uuid and monitor host lists), need to be stored separately in > every single VM disk definition separately. That makes for a lot of > maintenance when those details need to change (eg: replace a monitor > host (common), or change the auth details (less common)).You can use a hostname for the monitors here for a level of indirection. Monitors in ceph require fixed IPs, so they shouldn't need to be added or removed often in any case.> I'd prefer to be able to define a libvirt storage pool that contains > those details, and then reference the disks within each VM as volumes, > so that I only need to change the ceph monitor/auth details once per > libvirt compute host, rather than for every single VM disk definition. > > I've rebuilt my libvirt packages using --with-rbd-support so that I can > successfully define a libvirt storage pool as follows: > > <pool type='rbd'> > <name>libvirt-rbd-pool</name> > <source> > <name>libvirt-pool</name> > <host name='{monitor-host-1}' port='6789'/> > <host name='{monitor-host-2}' port='6789'/> > <host name='{monitor-host-3}' port='6789'/> > <auth username='libvirt' type='ceph'> > <secret uuid='9ec59067-fdbc-a6c0-03ff-df165c0587b8'/> > </auth> > </source> > </pool> > > However, when I go to start a VM with a volume created in that pool as > follows, I get an error: > > <disk type='volume' device='disk'> > <source pool='libvirt-rbd-pool' volume='{rbd-volume-name}'/> > <driver name='qemu' type='raw' cache='writethrough'/> > <target dev='vda' bus='virtio'/> > </disk> > > "using 'rbd' pools for backing 'volume' disks isn't yet supported" > > When I dug through the code, it appears that there's an explicit check > for RBD type storage pools (VIR_STORAGE_POOL_RBD) that disables that > (libvirt-1.2.13/src/storage/storage_driver.c:3159). > > Is there a particular reason for that? Has it just not been implemented > yet, or am I specifying the disk definition in the wrong way?Just needs implementing afaik. There were patches a while back: https://www.redhat.com/archives/libvir-list/2014-March/msg00107.html It'd be great to get those working again.> Second, using the former disk definition method, I'm able to run VMs > under qemu, *and* migrate them. Very slick. Nice work all. > > However, I found that since by default virt-manager leaves the VM > defined on both the source and destination, I'm actually able to start > the VM in both places. I didn't see an option to disable that, so I > just wrote a simple wrapper script to do the right thing via virsh using > --undefinesource, but I can't guarantee that some other admin might not > skip that and just use the GUI. It appears that libvirt (or is it > qemu?) doesn't set rbd locks on the disk image files by default. > > After running across [2], I had originally thought about writing some > hooks to set and release locks on the VMs using the rbd cli, but after > reading the docs on the migration process [3], I think that's probably > not possible since the VM is started in both places temporarily.In the upcoming hammer release of ceph, rbd images can have exclusive locking built in to prevent concurrent access. If two vms end up running against the same rbd image, they will trade the lock back and forth, which is far from ideal, but at least will not corrupt the disk. It seems like changing the GUI or other wrapper around libvirt to use --undefinesource is a better solution than adding another layer of locking. Josh> I think my other option is to setup some shared fs (maybe cephfs) and > point virtlockd at it so that all of the libvirt compute hosts register > locks on VMs properly. However, I thought I'd ask if anyone knows if > there's some magic other parameter or setting I can use to have > libvirt/qemu just use rbd locks natively. Or, is that not implemented > either? > > Thanks for your help. > > Cheers, > Brian > > [1] <http://ceph.com/docs/master/rbd/libvirt/#configuring-the-vm> > [2] > <https://www.redhat.com/archives/libvirt-users/2014-January/msg00058.html> > [3] <https://libvirt.org/hooks.html#qemu_migration> > > > _______________________________________________ > libvirt-users mailing list > libvirt-users@redhat.com > https://www.redhat.com/mailman/listinfo/libvirt-users >
Josh Durgin <jdurgin@redhat.com> 2015-03-31 15:45:>On 03/31/2015 11:47 AM, Brian Kroth wrote: >>Hi, I've recently been working on setting up a set of libvirt compute >>nodes that will be using a ceph rbd pool for storing vm disk image >>files. I've got a couple of issues I've run into. >> >>First, per the standard ceph documentation examples [1], the way to add >>a disk is to create a block in the VM definition XML that looks >>something like this: >> >><disk type='network' device='disk'> >> <source protocol='rbd' name='libvirt-pool/new-libvirt-image'> >> <host name='{monitor-host-1}' port='6789'/> >> <host name='{monitor-host-2}' port='6789'/> >> <host name='{monitor-host-3}' port='6789'/> >> </source> >> <target dev='vda' bus='virtio'/> >> <auth username='libvirt'> >> <secret type='ceph' uuid='9ec59067-fdbc-a6c0-03ff-df165c0587b8'/> >> </auth> >></disk> >> >>The trouble with this approach is that those ceph cluster details >>(secret uuid and monitor host lists), need to be stored separately in >>every single VM disk definition separately. That makes for a lot of >>maintenance when those details need to change (eg: replace a monitor >>host (common), or change the auth details (less common)). > >You can use a hostname for the monitors here for a level of indirection. >Monitors in ceph require fixed IPs, so they shouldn't need to be added >or removed often in any case. > >>I'd prefer to be able to define a libvirt storage pool that contains >>those details, and then reference the disks within each VM as volumes, >>so that I only need to change the ceph monitor/auth details once per >>libvirt compute host, rather than for every single VM disk definition. >> >>I've rebuilt my libvirt packages using --with-rbd-support so that I can >>successfully define a libvirt storage pool as follows: >> >><pool type='rbd'> >> <name>libvirt-rbd-pool</name> >> <source> >> <name>libvirt-pool</name> >> <host name='{monitor-host-1}' port='6789'/> >> <host name='{monitor-host-2}' port='6789'/> >> <host name='{monitor-host-3}' port='6789'/> >> <auth username='libvirt' type='ceph'> >> <secret uuid='9ec59067-fdbc-a6c0-03ff-df165c0587b8'/> >> </auth> >> </source> >></pool> >> >>However, when I go to start a VM with a volume created in that pool as >>follows, I get an error: >> >><disk type='volume' device='disk'> >> <source pool='libvirt-rbd-pool' volume='{rbd-volume-name}'/> >> <driver name='qemu' type='raw' cache='writethrough'/> >> <target dev='vda' bus='virtio'/> >></disk> >> >>"using 'rbd' pools for backing 'volume' disks isn't yet supported" >> >>When I dug through the code, it appears that there's an explicit check >>for RBD type storage pools (VIR_STORAGE_POOL_RBD) that disables that >>(libvirt-1.2.13/src/storage/storage_driver.c:3159). >> >>Is there a particular reason for that? Has it just not been implemented >>yet, or am I specifying the disk definition in the wrong way? > >Just needs implementing afaik. There were patches a while back: > >https://www.redhat.com/archives/libvir-list/2014-March/msg00107.html > >It'd be great to get those working again. > >>Second, using the former disk definition method, I'm able to run VMs >>under qemu, *and* migrate them. Very slick. Nice work all. >> >>However, I found that since by default virt-manager leaves the VM >>defined on both the source and destination, I'm actually able to start >>the VM in both places. I didn't see an option to disable that, so I >>just wrote a simple wrapper script to do the right thing via virsh using >>--undefinesource, but I can't guarantee that some other admin might not >>skip that and just use the GUI. It appears that libvirt (or is it >>qemu?) doesn't set rbd locks on the disk image files by default. >> >>After running across [2], I had originally thought about writing some >>hooks to set and release locks on the VMs using the rbd cli, but after >>reading the docs on the migration process [3], I think that's probably >>not possible since the VM is started in both places temporarily. > >In the upcoming hammer release of ceph, rbd images can have exclusive >locking built in to prevent concurrent access. If two vms end up running >against the same rbd image, they will trade the lock back and forth, >which is far from ideal, but at least will not corrupt the disk. > >It seems like changing the GUI or other wrapper around libvirt to use >--undefinesource is a better solution than adding another layer of >locking. > >Josh > >>I think my other option is to setup some shared fs (maybe cephfs) and >>point virtlockd at it so that all of the libvirt compute hosts register >>locks on VMs properly. However, I thought I'd ask if anyone knows if >>there's some magic other parameter or setting I can use to have >>libvirt/qemu just use rbd locks natively. Or, is that not implemented >>either?For what it's worth, I tried that with both virtlockd and sanlock, but neither appeared to even try to register a lock file for the rbd backed devices. So, I guess the only option is to patch the code. I'll see if I can circle back around to that later on, but it might be a while. Also in case anyone else sees this, to use flocks on cephfs you currently have to use the in kernel cephfs client rather than the fuse based one. In theory fixed in an upcoming ceph release, though not for mine (giant): https://www.mail-archive.com/ceph-users@lists.ceph.com/msg17761.html Cheers, Brian
Apparently Analagous Threads
- couple of ceph/rbd questions
- Re: ceph rbd pool and libvirt manageability (virt-install)
- Re: ceph rbd pool and libvirt manageability (virt-install)
- ceph rbd pool and libvirt manageability (virt-install)
- Re: Ceph RBD locking for libvirt-managed LXC (someday) live migrations