thr3ads.net - Gluster users - [Gluster-users] how well will this work [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Miles Fidelman

2012-Dec-27 04:24 UTC

[Gluster-users] how well will this work

Hi Folks,

I find myself trying to expand a 2-node high-availability cluster from 
to a 4-node cluster.  I'm running Xen virtualization, and currently 
using DRBD to mirror data, and pacemaker to failover cleanly.

The thing is, I'm trying to add 2 nodes to the cluster, and DRBD doesn't
scale.  Also, as a function of rackspace limits, and the hardware at 
hand, I can't separate storage nodes from compute nodes - instead, I 
have to live with 4 nodes, each with 4 large drives (but also w/ 4 gigE 
ports per server).

The obvious thought is to use Gluster to assemble all the drives into 
one large storage pool, with replication.  But.. last time I looked at 
this (6 months or so back), it looked like some of the critical features 
were brand new, and performance seemed to be a problem in the 
configuration I'm thinking of.

Which leads me to my question:  Has the situation improved to the point 
that I can use Gluster this way?

Thanks very much,

Miles Fidelman


-- 
In theory, there is no difference between theory and practice.
In practice, there is.   .... Yogi Berra

Joe Julian

2012-Dec-27 06:04 UTC

head link

[Gluster-users] how well will this work

It would probably be better to ask this with end-goal questions instead 
of with a unspecified "critical feature" list and "performance
problems".

6 months ago, for myself and quite an extensive (and often impressive) 
list of users there were no missing critical features nor was there any 
problems with performance. That's not to say that they did not meet your 
design specifications, but without those specs you're the only one who 
could evaluate that.

On 12/26/2012 08:24 PM, Miles Fidelman wrote:> Hi Folks,
>
> I find myself trying to expand a 2-node high-availability cluster from 
> to a 4-node cluster.  I'm running Xen virtualization, and currently 
> using DRBD to mirror data, and pacemaker to failover cleanly.
>
> The thing is, I'm trying to add 2 nodes to the cluster, and DRBD 
> doesn't scale.  Also, as a function of rackspace limits, and the 
> hardware at hand, I can't separate storage nodes from compute nodes - 
> instead, I have to live with 4 nodes, each with 4 large drives (but 
> also w/ 4 gigE ports per server).
>
> The obvious thought is to use Gluster to assemble all the drives into 
> one large storage pool, with replication.  But.. last time I looked at 
> this (6 months or so back), it looked like some of the critical 
> features were brand new, and performance seemed to be a problem in the 
> configuration I'm thinking of.
>
> Which leads me to my question:  Has the situation improved to the 
> point that I can use Gluster this way?
>
> Thanks very much,
>
> Miles Fidelman
>
>

Gerald Brandt

2012-Dec-27 12:23 UTC

head link

[Gluster-users] how well will this work

On 12-12-26 10:24 PM, Miles Fidelman wrote:> Hi Folks,
>
> I find myself trying to expand a 2-node high-availability cluster from 
> to a 4-node cluster.  I'm running Xen virtualization, and currently 
> using DRBD to mirror data, and pacemaker to failover cleanly.
>
> The thing is, I'm trying to add 2 nodes to the cluster, and DRBD 
> doesn't scale.  Also, as a function of rackspace limits, and the 
> hardware at hand, I can't separate storage nodes from compute nodes - 
> instead, I have to live with 4 nodes, each with 4 large drives (but 
> also w/ 4 gigE ports per server).
>
> The obvious thought is to use Gluster to assemble all the drives into 
> one large storage pool, with replication.  But.. last time I looked at 
> this (6 months or so back), it looked like some of the critical 
> features were brand new, and performance seemed to be a problem in the 
> configuration I'm thinking of.
>
> Which leads me to my question:  Has the situation improved to the 
> point that I can use Gluster this way?
>
> Thanks very much,
>
> Miles Fidelman
>
>Hi,

I have a XenServer pool (3 servers) talking to an GlusterFS replicate 
server over NFS with uCARP for IP failover.

The system was put in place in May 2012, using GlusterFS 3.3.  It ran 
very well, with speeds comparable to my existing iSCSI solution 
(http://majentis.com/2011/09/21/xenserver-iscsi-and-glusterfsnfs/

I was quite pleased with the system, it worked flawlessly.  Until 
November.  At that point, the Gluster NFS server started stalling under 
load.  It would become unresponsive for a long enough period of time 
that the VM's under XenServer would lose their drives. Linux would 
remount the drives read-only and then eventually lock up, while Windows 
would just lock up.  In this case, Windows was more resilient to the 
transient disk loss.

I have been unable to solve the problem, and am now switching back to a 
DRBD/iSCSI setup.  I'm not happy about it, but we were losing NFS 
connectively nightly, during backups.  Life was hell for a long time 
while I was trying to fix things.

Gerald
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20121227/7c0e3f7d/attachment.html>

Brian Candler

2012-Dec-27 12:48 UTC

head link

[Gluster-users] how well will this work

On Wed, Dec 26, 2012 at 11:24:25PM -0500, Miles Fidelman
wrote:> I find myself trying to expand a 2-node high-availability cluster
> from to a 4-node cluster.  I'm running Xen virtualization, and
> currently using DRBD to mirror data, and pacemaker to failover
> cleanly.
Not answering your question directly, but have you looked at Ganeti? This is
a front-end to Xen+LVM+DRBD (open source, written by Google) which makes it
easy to manage such a cluster, assuming DRBD is meeting your needs well at
the moment.

With Ganeti each VM image is its own logical volume, with its own DRBD
instance sitting on top, so you can have different VMs mirrored between
different pairs of machines.  You can migrate storage, albeit slowly (e.g. 
starting with A mirrored to B, you can break the mirroring then re-mirror A
to C, and then mirror C to D). Ganeti automates all this for you.

Another option to look at is Sheepdog, which is a clustered block-storage
device, but this would require you to switch from Xen to KVM.
> and performance seemed to be a
> problem in the configuration I'm thinking of.
With KVM at least, last time I tried performance was still very poor when
a VM image was being written to a file over gluster - I measured about
6MB/s.

However remember that each VM can directly mount glusterfs volumes
internally, and the performance of this is fine - and it also means you can
share data between the VMs.  So with some rearchitecture of your application
you may get sufficient performance for your needs.

Regards,

Brian.

John Mark Walker

2012-Dec-27 15:03 UTC

head link

[Gluster-users] how well will this work

Look, fuse its issues that we all know about. Either it works for you or it
doesn't. If fuse bothers you that much, look into libgfapi.

Re: NFS - I'm trying to help track this down. Please either add your comment
to an existing bug or create a new ticket.

Either way, ranting won't solve your problem or inspire anyone to fix it. 

-JM


Stephan von Krawczynski <skraw at ithnet.com> wrote:

On Wed, 26 Dec 2012 22:04:09 -0800
Joe Julian <joe at julianfamily.org> wrote:
> It would probably be better to ask this with end-goal questions instead 
> of with a unspecified "critical feature" list and
"performance problems".
> 
> 6 months ago, for myself and quite an extensive (and often impressive) 
> list of users there were no missing critical features nor was there any 
> problems with performance. That's not to say that they did not meet
your
> design specifications, but without those specs you're the only one who 
> could evaluate that.
Well, then the list of users does obviously not contain me ;-)
The damn thing will only become impressive if a native kernel client module is
done. FUSE is really a pain.
And read my lips: the NFS implementation has general load/performance problems.
Don't be surprised if it jumps into your face.
Why on earth do they think linux has NFS as kernel implementation?
-- 
Regards,
Stephan
_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

John Mark Walker

2012-Dec-27 23:15 UTC

head link

[Gluster-users] how well will this work

I am hopeful that 3.4 will go much further in this regard. At this point, when
anyone asks me about VM image management, I tell them it works for some and not
for others. I've seen enough bad outcomes to not recommend it in all cases,
but I've also seen enough good outcomes to not discount it out-of-hand
either.

My answer now is the same as it has been: use at your own risk. But we've
made much progress, and the recent qemu integration and libgfapi is a
continuation of that.

In general, I don't recommend any distributed filesystems for VM images, but
I can also see that this is the wave of the future.

-JM

Miles Fidelman <mfidelman at meetinghouse.net> wrote:

Dan Cyr wrote:>
> Miles - As is right now GlusterFS is not what you want for backend VM 
> storage.
>
> Question: ?how well will this work?
>
> Answer: ?horribly?
>
>
Ok... that's the kind of answer I was looking for (though a 
disappointing one).

Thanks,

Miles

-- 
In theory, there is no difference between theory and practice.
In practice, there is.   .... Yogi Berra

_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-users

William Muriithi

2012-Dec-28 16:29 UTC

head link

[Gluster-users] how well will this work

Fidelman,
> Let's say that I take a slightly looser approach to high-availability:
> - keep the static parts of my installs on local disk
> - share and replicate dynamic data using gluster
> - failover by rebooting on a different node (no image to worry about
> migrating)
>
> In this scenario, how well does gluster work when:
> - storage and processing are inter-mixed on the same nodes
Have you checked out GFS?  If your hardware are IPMI capable, your
configuration is a perfect candidate for GFS.  It actually is also far
reliable.  I have not used it in production, but have set it up and
played around with it. Also on their mailing list and they have good
words for it

Glusterfs is far better for a detached storage that need scaling,
cheap (Don't have IPMI capable servers) and also redundant.  I am have
having high CPU utilization/slow writes on the clients end but the
server side is very solid.  Will keep testing and watching the mailing
list going forward
> - data is triply replicated (allow for 2-node failures)
>
> Miles Fidelman
>
> --
> In theory, there is no difference between theory and practice.
> In practice, there is.   .... Yogi Berra
>
>
>William

William Muriithi

2012-Dec-28 16:54 UTC

head link

[Gluster-users] how well will this work

Joe,> I have 3 servers with replica 3 volumes, 4 bricks per server on lvm
> partitions that are placed on each of 4 hard drives, 15 volumes
> resulting in 60 bricks per server. One of my servers is also a kvm host
> running (only) 24 vms.
>
Mind explaining your setup again.  I kind of could not follow,
probably because of terminology issues.  For example

 4 bricks per server  - Don't understand this part, I assumes a brick
== 1 physical server (Okay, could also be one vm, but don't see how
that would be help unless its a test environment).  The way you put it
though, mean I have issues with my terminology.

Isn't there a 1:1 relationship between brick and server?
> Each vm image is only 6 gig, enough for the operating system and
> applications and is hosted on one volume. The data for each application
> is hosted on its own GlusterFS volume.Hmm, petty good idea, especially security wise.  Means one VM can not
mess with another vm files.  Is it possible to extend gluster volume
without destroying and recreating it with bigger peer storage
setting>
> For mysql, I set up my innodb store to use 4 files (I don't do 1 file
> per table), each file distributes to each of the 4 replica subvolumes.
> This balances the load pretty nicely.
I thought lots of small files would be better than 4 huge files?  I
mean, why does this work out better performance wise?  Not saying its
wrong, I am just trying to learn from you as I am looking for a
similar setup. However, I could not think why using 4 files would be
better but this may because I don't understand how glusterfs works may
be>
> I don't really do anything special for anything else, other than the
php
> app recommendations I make on my blog (http://joejulian.name) which all
> have nothing to do with the actual filesystem.
>
Thanks for the link> The thing that I think some people (even John Mark) miss apply is that
> this is just a tool. You have to engineer a solution using the tools you
> have available. If you feel the positives that GlusterFS provides
> outweigh the negatives, then you will simply have to engineer a solution
> that suits your end goal using this tool. It's not a question of
whether
> it works, it's whether you can make it work for your use case.
>
> On 12/27/2012 03:00 PM, Miles Fidelman wrote:
>> Ok... now that's diametrically the opposite response from Dan
Cyr's of
>> a few minutes ago.William

William Muriithi

2012-Dec-30 20:31 UTC

head link

[Gluster-users] how well will this work

Thanks Joe,
>> Isn't there a 1:1 relationship between brick and server?
> In my configuration, 1 server has 4 drives (well, 5, but one's the OS).
> Each drive has one gpt partition. I create an lvm volume group that
> holds all four huge partitions. For any one GlusterFS volume I create 4
> lvm logical volumes:
>
> lvcreate -n a_vmimages clustervg /dev/sda1
> lvcreate -n b_vmimages clustervg /dev/sdb1
> lvcreate -n c_vmimages clustervg /dev/sdc1
> lvcreate -n d_vmimages clustervg /dev/sdd1
>
> then format them xfs and (I) mount them under
> /data/glusterfs/vmimages/{a,b,c,d}. These four lvm partitions are bricks
> for the new GlusterFS volume.Followed, actually, going to redo it this way, but will or a RAID
instead of individual drive.  Thanks>
> As glusterbot would say if asked for the glossary:
>> A "server" hosts "bricks" (ie. server1:/foo) which
belong to a
>> "volume"  which is accessed from a "client".
>Yes, checked the manual glossary and its well explained.  Had yet to
read those last pages> My volume would then look like
> gluster volume create replica 3
> server{1,2,3}:/data/glusterfs/vmimages/a/brick
> server{1,2,3}:/data/glusterfs/vmimages/b/brick
> server{1,2,3}:/data/glusterfs/vmimages/c/brick
> server{1,2,3}:/data/glusterfs/vmimages/d/brick
>>> Each vm image is only 6 gig, enough for the operating system and
>>> applications and is hosted on one volume. The data for each
application
>>> is hosted on its own GlusterFS volume.
>> Hmm, petty good idea, especially security wise.  Means one VM can not
>> mess with another vm files.  Is it possible to extend gluster volume
>> without destroying and recreating it with bigger peer storage setting
> I can do that two ways. I can add servers with storage and then
> add-brick to expand, or I can resize the lvm partitions and grow xfs
> (which I have done live several times).Will be going with lvm, now that I understand what is a
brick>
>>> For mysql, I set up my innodb store to use 4 files (I don't do
1 file
>>> per table), each file distributes to each of the 4 replica
subvolumes.
>>> This balances the load pretty nicely.
> It's not so much a "how glusterfs works" question as much as
it is a how
> innodb works question. By configuring the innodb_data_file_path to start
> with a multiple of your bricks (and carefully choosing some filenames to
> ensure they're distributed evenly), records seem to be (and I only have
> tested this through actual use and have no idea if this is how it's
> supposed to work) accessed evenly over the distribute set.
>Hmm, have you checked on the gluster servers that these four files are
in separate bricks?  As far as I understand, if you have not done
anything Glusterfs scheduler (Default ALU on version 3.3), it is
likely that is not whats happening. Or you are using a version that
has a different scheduler.  Interesting though.  Poke around and
update us please


Thanks

William

Reasonably Related Threads

Search for more seemingly similar threads

Gluster users - Dec 2012 - how well will this work

[Gluster-users] how well will this work

[Gluster-users] how well will this work

[Gluster-users] how well will this work

[Gluster-users] how well will this work

[Gluster-users] how well will this work

[Gluster-users] how well will this work

[Gluster-users] how well will this work

[Gluster-users] how well will this work

[Gluster-users] how well will this work

Reasonably Related Threads