thr3ads.net - Xen users - live migration iscsi and lvm [Jun 2012]

If this information is useful, please help other people find it:
Share via:

John McMonagle

2012-Jun-29 15:32 UTC

live migration iscsi and lvm

I''m experimenting setting up san and live migrations.

Running debian squeeze and usually create domains with xentools although I 
will consider other possibilities.

Saw a tutorial iscsi + lvm.
Setup a iscsi target for use as a volume group.
I made it into a volume group and as far as that seems to work fine.

What are my possibilities for destruction :-)
Like to test the limits now when nothing that really matters is on the vg.
If I create a new logical volume on one node it  shows up on the other node 
without and issues.

Should I be using clvm?
I tried to install it but it  seems to want a setup in /etc/cluster.
Or should I do the sharing in a different way?

John

Javier Guerra Giraldez

2012-Jun-29 20:29 UTC

head link

Re: live migration iscsi and lvm

On Fri, Jun 29, 2012 at 10:32 AM, John McMonagle <johnm@advocap.org>
wrote:> Should I be using clvm?
not necessarily; but it does have some advantages.

in most VM setups you don''t use cluster filesystems (GFS, OCFS2, etc),
so you should never mount the same volume on two machines.  that
extends to virtual machines too, so you must not start the same VM
image on two hosts.  most setups help you enforce this, and live
migration makes sure the target VM isn''t resumed until the original VM
is no longer working.

so far, no need for any ''cluster'' setup.

LVM reads all the physical volume, volume group and logical volume
layouts (what it calls metadata) from the shared storage to RAM at
startup and then works from there.  It''s only rewritten to disk when
modified (creating/deleting volumes, resizing them, adding PVs, etc).

That means that if you start more than one box connected to the same
PVs they''ll be able to reach the same VGs and LVs, and if you
don''t
modify anything, it would run perfectly.  But, if you want to do any
metadata change, you have to:

1) choose one machine to do the change
2) on every other machine, disconnect from the VG (vgchange -a n)
3) do any needed change on the only machine that''s still connected to
the VG
4) reread the metadata on all machines (lvscan)

if you have periodic planned downtimes, you can schedule things and
work like this; but if can''t afford the few minutes it takes, you need
clvm

what clvm do is to use the ''suspend'' feature of the device
mapper to
make sure no process on no machine perform any access to the shared
storage until the metadata changes have been propagated.  roughly:

0) there''s a clvmd daemon running on all machines that have access to
the VG.  they use a distributed lock manager to keep in touch.
1) you do any LVM command that modifies metadata on any machine
2) the LVM command asks the clvmd process to acquire a distributed lock
3) to get that lock, all the clvmd deamons issue a dmsuspend.  this
doesn''t ''freeze'' the machine, only blocks any IO
request on any LV
member of the VG
4) when all other machines are suspended, the original clvmd has
acquired the lock, and allows the LVM command to progress
5) when the LVM command is finished, it asks the clvmd to release the lock
6) to release the lock, the daemons in every other machine reread the
LVM metadata (lvscan) and lifts the dmsuspend status
7) when all the machines are unsuspended, the LVM command returns to
the CLI prompt, and everything is running again.

as you can see, it''s the same as the manual process, but since it all
happens in a few miliseconds, the ''other'' machines can be just
suspended instead of having to be really brought down.

i guess it could also be done with a global script that spreads the
''dmsuspend / wait / lvscan / dmresume'' commands; but by the
time you
get it to run reliably, you''ve replicated the shared lock
functionality.

-- 
Javier

John McMonagle

2012-Jun-30 12:06 UTC

head link

Re: live migration iscsi and lvm

On Friday 29 June 2012 3:29:47 pm Javier Guerra Giraldez
wrote:> On Fri, Jun 29, 2012 at 10:32 AM, John McMonagle <johnm@advocap.org>
wrote:
> > Should I be using clvm?
> 
> not necessarily; but it does have some advantages.
> 
> in most VM setups you don''t use cluster filesystems (GFS, OCFS2,
etc),
> so you should never mount the same volume on two machines.  that
> extends to virtual machines too, so you must not start the same VM
> image on two hosts.  most setups help you enforce this, and live
> migration makes sure the target VM isn''t resumed until the
original VM
> is no longer working.
> 
> so far, no need for any ''cluster'' setup.
> 
> LVM reads all the physical volume, volume group and logical volume
> layouts (what it calls metadata) from the shared storage to RAM at
> startup and then works from there.  It''s only rewritten to disk
when
> modified (creating/deleting volumes, resizing them, adding PVs, etc).
> 
> That means that if you start more than one box connected to the same
> PVs they''ll be able to reach the same VGs and LVs, and if you
don''t
> modify anything, it would run perfectly.  But, if you want to do any
> metadata change, you have to:
> 
> 1) choose one machine to do the change
> 2) on every other machine, disconnect from the VG (vgchange -a n)
> 3) do any needed change on the only machine that''s still connected
to the
> VG 4) reread the metadata on all machines (lvscan)
> 
> if you have periodic planned downtimes, you can schedule things and
> work like this; but if can''t afford the few minutes it takes, you
need
> clvm
> 
> what clvm do is to use the ''suspend'' feature of the
device mapper to
> make sure no process on no machine perform any access to the shared
> storage until the metadata changes have been propagated.  roughly:
> 
> 0) there''s a clvmd daemon running on all machines that have access
to
> the VG.  they use a distributed lock manager to keep in touch.
> 1) you do any LVM command that modifies metadata on any machine
> 2) the LVM command asks the clvmd process to acquire a distributed lock
> 3) to get that lock, all the clvmd deamons issue a dmsuspend.  this
> doesn''t ''freeze'' the machine, only blocks any IO
request on any LV
> member of the VG
> 4) when all other machines are suspended, the original clvmd has
> acquired the lock, and allows the LVM command to progress
> 5) when the LVM command is finished, it asks the clvmd to release the lock
> 6) to release the lock, the daemons in every other machine reread the
> LVM metadata (lvscan) and lifts the dmsuspend status
> 7) when all the machines are unsuspended, the LVM command returns to
> the CLI prompt, and everything is running again.
> 
> as you can see, it''s the same as the manual process, but since it
all
> happens in a few miliseconds, the ''other'' machines can be
just
> suspended instead of having to be really brought down.
> 
> i guess it could also be done with a global script that spreads the
> ''dmsuspend / wait / lvscan / dmresume'' commands; but by
the time you
> get it to run reliably, you''ve replicated the shared lock
> functionality.
Thanks for the information.

I have been trying to get clvm working but it''s dependent on cman and
have not
had  much luck figuring out cluster.conf.

I see the new version in testing has no dependency on cman so I think
I''ll try
that one.
Looks like will have to upgrade lvm2 also.

I have seen references saying that you can not do snapshots.
And others saying the the snapshot  and snapshotted volume have to be on one 
node.  That I can live with. Is that the case?


John

Fajar A. Nugraha

2012-Jun-30 12:13 UTC

head link

Re: live migration iscsi and lvm

On Sat, Jun 30, 2012 at 7:06 PM, John McMonagle <johnm@advocap.org> wrote:
> I have been trying to get clvm working but it''s dependent on cman
and have not
> had  much luck figuring out cluster.conf.
>
> I see the new version in testing has no dependency on cman so I think
I''ll try
> that one.
> Looks like will have to upgrade lvm2 also.
>
> I have seen references saying that you can not do snapshots.
> And others saying the the snapshot  and snapshotted volume have to be on
one
> node.  That I can live with. Is that the case?
Since you''re using an iscsi target anyway, why not just setup a
separate LUN for each VM? That you don''t have to worry about LVM on
the dom0 nodes.

-- 
Fajar

Stephan Seitz

2012-Jul-03 09:10 UTC

head link

Re: live migration iscsi and lvm

Am Samstag, den 30.06.2012, 19:13 +0700 schrieb Fajar A.
Nugraha:> On Sat, Jun 30, 2012 at 7:06 PM, John McMonagle <johnm@advocap.org>
wrote:
> 
> > I have been trying to get clvm working but it''s dependent on
cman and have not
> > had  much luck figuring out cluster.conf.
> >
> > I see the new version in testing has no dependency on cman so I think
I''ll try
> > that one.
> > Looks like will have to upgrade lvm2 also.
> >
> > I have seen references saying that you can not do snapshots.
> > And others saying the the snapshot  and snapshotted volume have to be
on one
> > node.  That I can live with. Is that the case?
> 
> Since you''re using an iscsi target anyway, why not just setup a
> separate LUN for each VM? That you don''t have to worry about LVM
on
> the dom0 nodes.
> 
I''ve tried that, but was never able to figure out how a smoothly
working
multipath setup could be done for direct LUN/VM mapping. After some
xm/xl create and migrate either dm-multipath or open-iscsi always had
issues with locked devices which in most cases couldn''t be freed. I
assume better scripting skills for my xen/scripts/block-* attach/detach
scripts could''ve mitigate that.
Additionally, using a fixed set of LUNs connected via fixed amount of
paths over fixed dedicated links and combined as PVs together to VGs
gave me the chance of setting any tcp and iscsi (target as well as
initiator) value to ideally fit to latency and bandwidth. Having
different kind of Disks and Raid behind the LUNs gives me the
possibility of offering different service tiers with guaranteed
throughput and predictable
latency to each dom0.
For future setups clvm will be my first choice again, only the lack of
snapshots for clustered VGs is a pain. Having 1:1 LUN/VM as you suggest,
it''s up to your storages if (and how) snapshots can be triggered, so
if snapshots are required, one can''t use clvm. At least one could get
rid of the "c" and use lvm without any locking mechanism, but when it
comes to lvresize, lvcreate -s etc. reloading the lvm layout on every
dom0 becomes mandatory. I wouldn''t recommend lvm without clustering, as
it''s extremely easy to get out of layout sync which include potential
data loss (requires grande cojones...)
By now, I''m running different test scenarios with plain ocfs2 and
tap:aio backends. Performance is not too bad, but I recently managed
to i/o deadlock my dom0s on high domU i/o demands. Tracking that down,
kept me from trying lvm+ocfs2 combination and other weird scenarios ;)

My advice to John would be: learn the very basics of cman and friends
(RHEL6 and derivates offers very nice tutorials, as a starting point:
look for luci and ricci) and try gfs, ocfs2 and or clvm.

cheers,

Stephan

Xen users - Jun 2012 - live migration iscsi and lvm

live migration iscsi and lvm

Re: live migration iscsi and lvm

Re: live migration iscsi and lvm

Re: live migration iscsi and lvm

Re: live migration iscsi and lvm