thr3ads.net - Gluster users - [Gluster-users] I'm new to Gluster, and have some questions [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Rudi Ahlers

2010-Oct-21 08:09 UTC

[Gluster-users] I'm new to Gluster, and have some questions

Hi all,

I'm considering setting up Gluster, and have a few questions if you
don't mind.


1. Which option is better? I already have a few CentOS 5.5. server
setup. Would it be better to just install GlusterFS, or to install
Gluster Storage Platform from scratch? How / where can I see a full
comparison between the 2? Are there any performance / management
benefits in choosing the one of the other?

2. I need reliability and speed. From what I understand, I could setup
2 servers to work similar to software RAID1 (mirroring). Is it also
correct to assume that I could use 4 servers in a RAID10 / 1+0 type
setup? But then obviously serverA & serverB will be mirrored, and
serverC & serverD together? What happens to the data? Does it get
filled randomly between the 2 sets of servers, or does it get put onto
serverA & B first, till it's full then move over to C & D?

3. Has anyone noticed any considerable differences in using 1x 1GB NIC
& 2x 1GB NIC's bonded together? Or should I rather use a Quad port NIC
if / where possible?

4. How do clients (i.e. users) connect if I want to give them normal
FTP / SMB / NFS access? Or do I need to mount the exported Gluster to
another Linux server first which runs these services already?

5. If there's 10 Gluster servers, for example, with a lot of data
spread out across them. How do the clients connect, exactly? I.e. do
they all connect to a central server which then just "fetches and
delivers" the content to the clients, or do the client's connect
directly to the specific server where their content is? i.e. is the
network traffic split evenly across the servers, according to where
the data is stored?

tia :)

-- 
Kind Regards
Rudi Ahlers
SoftDux

Website: http://www.SoftDux.com
Technical Blog: http://Blog.SoftDux.com
Office: 087 805 9573
Cell: 082 554 7532

Daniel Mons

2010-Oct-22 00:03 UTC

head link

[Gluster-users] I'm new to Gluster, and have some questions

On Thu, Oct 21, 2010 at 6:09 PM, Rudi Ahlers <Rudi at softdux.com>
wrote:> 1. Which option is better? I already have a few CentOS 5.5. server
> setup. Would it be better to just install GlusterFS, or to install
> Gluster Storage Platform from scratch? How / where can I see a full
> comparison between the 2? Are there any performance / management
> benefits in choosing the one of the other?
Gluster Storage Platform is near zero effort to set up.  Literally
boot from the provided USB stick image, and follow your nose.  From
there, all setup is via a GUI, and it's easy to see what's going on
for novices.

The downside is for all of that GUI management stuff, you lose a lot
of low-level control (and IMHO understanding of what's going on).  So
the trade off there is whether you want a graphical management tool
where a lot of the "black magic" is hidden, or whether you want to
roll up your sleeves and control the system yourself.

As a long-time Linux sysadmin, I prefer the GlusterFS option on a
Linux distro of my choice.  Pretty GUIs are nice for Windows and
VMWare users who generally fear keyboards, but give me a CLI (and SSH
access!) any day.  Personal preference caveat emptor.
> 2. I need reliability and speed. From what I understand, I could setup
> 2 servers to work similar to software RAID1 (mirroring). Is it also
> correct to assume that I could use 4 servers in a RAID10 / 1+0 type
> setup? But then obviously serverA & serverB will be mirrored, and
> serverC & serverD together? What happens to the data? Does it get
> filled randomly between the 2 sets of servers, or does it get put onto
> serverA & B first, till it's full then move over to C & D?
There's no right and wrong here.  You can set up individual disks as
bricks if you like (multiple bricks per server), or you can LVM/JBOD
them up and present one big brick per node, or you can use RAID per
node.   Performance and reliability ratios of each really are up to
your own personal need.

Speaking for myself, I currently use RAID5 for nodes with 6 or fewer
disks, and RAID6 for 7 or more disks, presenting a single logical
storage brick per node regardless of how many physical disks are in
each.  My GlusterFS setups are replicate+stripe across the whole
cluster.  So there's multiple levels of redundancy (within the node,
and within the cluster) which lets me sleep easier at night.

As for how the data gets shuffled about inside GlusterFS, that depends
really on how you've set it up.  For distributed data, there are
various thresholds you can set to make sure that once a limit is hit,
data will be written to other servers by preference.  Obviously with
replicate and stripe that isn't so much of an option, as technically
data will roughly fill all nodes evenly.

A lot of the technical detail (including how Gluster chooses nodes) is
covered in the doco:
http://www.gluster.com/community/documentation/index.php/Translators/cluster
> 3. Has anyone noticed any considerable differences in using 1x 1GB NIC
> & 2x 1GB NIC's bonded together? Or should I rather use a Quad port
NIC
> if / where possible?
Simply put, the more NICs the better.  If you start to get a lot of
clients hitting the storage, you really want a lot of bandwidth to
serve it.  Plus 2 NICs per box give you redundancy as well, which is
an added plus.

A quad port NIC per node could get costly once you add up switch ports
and the like.  Depending on your vendor of choice, the jump to 10GbE
may be worth it.

It's probably also worth remembering that you need to make sure your
disk can feed the network well enough.  With 8 commodity 1TB SATA
7200RPM disks and Linux software RAID6, I get about 500MB/s serial
reads on a single node (verified by both "dd" and
"bonnie++").  That's
enough to saturate 2x 1GbE cards, but 10GbE would probably be a waste.
 If I had larger storage systems (SAS/FC 10K or 15K RPM drives, or
even SAN-backed storage), then 10GbE or even Infiniband would start to
come into consideration.
> 4. How do clients (i.e. users) connect if I want to give them normal
> FTP / SMB / NFS access? Or do I need to mount the exported Gluster to
> another Linux server first which runs these services already?
Yes, you need other services in front of GlusterFS.  These don't
necessarily need to be on separate machines - there's nothing stopping
you running Samba/NFS/whatever on one of the Gluster nodes with a
locally mounted GlusterFS and re-exporting from there.  Obviously that
means something else could potentially eat into the performance of
that node, which is something to consider on large sites.

Remember too that you can spread your services.  If you have 4
GlusterFS nodes, you could put Samba/NFS/whatever on all 4, and via
some scripting (or even network/DNS/VLAN) magic ensure that all of the
users/machines in your org are spread somewhat evenly across all four
nodes.  That also means that if a single Samba/NFS/whatever
server/service dies, only part of your network goes down, and affected
users/machines could be migrated to other systems quickly.  There are
still advantages to having GlusterFS-backed storage even with "legacy"
file sharing protocols in place over the top.
> 5. If there's 10 Gluster servers, for example, with a lot of data
> spread out across them. How do the clients connect, exactly? I.e. do
> they all connect to a central server which then just "fetches and
> delivers" the content to the clients, or do the client's connect
> directly to the specific server where their content is? i.e. is the
> network traffic split evenly across the servers, according to where
> the data is stored?
Some explanation here:
http://www.youtube.com/watch?v=EbJFWBkQpZ8

The client side of Gluster does a lot of work to decentralise the
system.  There's no "master node" per se, and the Global Name
Space
allows the client to see all servers at once:
http://en.wikipedia.org/wiki/Global_Namespace

This is quite a bit different to clusters of old, but the advantage is
native Gluster clients will fetch data direct from the storage node
that has it (or if it's striped/replicated, then it will split across
them via various load balancing algorithms defined by you - these can
be "least busy", "round robin" and others).  There's no
need for
native Gluster clients to fetch data through a single master node,
which alleviates bottlenecks.  Obviously this isn't the case for
people accessing data through Samba/NFS in front of Gluster, but as
before there are things you can do to spread that load and network
traffic as well.

The whole concept of Gluster is very clever, and makes a lot of sense.
 The huge advantage of all of it, of course, is that for every node
you add, you're also adding bandwidth to the overall cluster.  This is
the polar opposite of traditional centralised storage systems (SANs,
etc) where adding storage blocks reduces the average bandwidth per
client, making performance worse as you scale (don't say that to a SAN
vendor though, because they'll get very upset and red faced, as it's
the dirty little secret of the SAN business).

Particularly for sites that require consistent storage growth over
time (and lets face it - who doesn't?), Gluster a fantastic idea.
Let's just say that traditional SAN and NAS solutions are now at the
bottom of my shopping list when it comes to storage rollouts for
business technology infrastructure I'm in charge of designing.

-Dan

Horacio Sanson

2010-Oct-22 00:55 UTC

head link

[Gluster-users] I'm new to Gluster, and have some questions

I am just starting playing with Gluster but I think I can give you some 
answers from my experience.

On Thursday 21 October 2010 17:09:32 Rudi Ahlers wrote:> Hi all,
> 
> I'm considering setting up Gluster, and have a few questions if you
don't
> mind.
> 
> 
> 1. Which option is better? I already have a few CentOS 5.5. server
> setup. Would it be better to just install GlusterFS, or to install
> Gluster Storage Platform from scratch? How / where can I see a full
> comparison between the 2? Are there any performance / management
> benefits in choosing the one of the other?
> 
The Gluster Storage Platform requires GlusterFS. The platform is a complete OS 
(linux Fedora) + GlusterFS + Web Management in a single package that can be 
installed via USB in a few minutes.  It is supposed to simplify installation, 
setup and management of GlusterFS clusters but.... I could not get it to work 
properly.

I was unable to add new servers. Everytime I pressed the add new server button 
I got an error saying "Could not retrive installer ip address". And
since the
platform is relative new there is near zero documentation/issue reports about 
it.  Also adding the servers/volumes via command line never reflected to the 
web based GUI

So I installed Ubuntu 10.10 LTS and GlusterFS 3.1 via source code and handling 
the server/volumes etc via the new command line is a breeze.
> 2. I need reliability and speed. From what I understand, I could setup
> 2 servers to work similar to software RAID1 (mirroring). Is it also
> correct to assume that I could use 4 servers in a RAID10 / 1+0 type
> setup? But then obviously serverA & serverB will be mirrored, and
> serverC & serverD together? What happens to the data? Does it get
> filled randomly between the 2 sets of servers, or does it get put onto
> serverA & B first, till it's full then move over to C & D?
> I only have two servers for testing. What you setup are volumes and each 
volume can be configured depending on your needs. This is what I understand so 
far:

Distributed volume:  Aggregates the storage of several directories (bricks in
gluster terms) among several computers. The benefit is that you  can 
grow/shrink the volume as you please. The bad part is that  this offers no 
performance/reliability guarantees as files are  stored randomly among the 
disks in the volume.

Replicated volume: Requires minimum 2 bricks in separate servers. All files are 
replicated among the bricks. How many replicas can be configured at volume 
creation. Has all the benefits of a Distributed volume plus fail resilience.

Stripe volume: Requires minimum 2 bricks in separate servers. All files are 
splitted in stripes and these stripes are distributed among the bricks of the 
volume. How many stripes and which size is configured on volume creation. Has 
all the benefits of Replicated volume plus reliability and can improve read 
performance for large files as the read is distributed among several machines.

> 3. Has anyone noticed any considerable differences in using 1x 1GB NIC
> & 2x 1GB NIC's bonded together? Or should I rather use a Quad port
NIC
> if / where possible?
> 
> 4. How do clients (i.e. users) connect if I want to give them normal
> FTP / SMB / NFS access? Or do I need to mount the exported Gluster to
> another Linux server first which runs these services already?
> Gluster 3.1 has a native NFS v3 implementation so you can mount any Gluster 
volume as a normal NFS mount. For SMB you need to configure samba to share the 
volume and you can easily access the files on any of the bricks via SCP or FTP 
if you have an SSH or FTP server configured. For linux the recommended way is 
to use the glusterfs module to mount as a gluster file system.
> 5. If there's 10 Gluster servers, for example, with a lot of data
> spread out across them. How do the clients connect, exactly? I.e. do
> they all connect to a central server which then just "fetches and
> delivers" the content to the clients, or do the client's connect
> directly to the specific server where their content is? i.e. is the
> network traffic split evenly across the servers, according to where
> the data is stored?
> This is also something I would like to know. When connecting clients I use the 
command

   mount -t [nfs|glusterfs]  <ip-address>:<volume-name> /mount/point

where ip-address is the IP of any of the servers that have the volume 
configured. It is not clear to me how the reliability part works here. If I 
disconnect the server with that ip-address I loose access to the files. True 
that the files are still accessible via other servers but I need to manually 
set the mount to point to another server which is not exactly high-
availability.

> tia :)
-- 
regards,
Horacio Sanson

Gluster users - Oct 2010 - I'm new to Gluster, and have some questions

[Gluster-users] I'm new to Gluster, and have some questions

[Gluster-users] I'm new to Gluster, and have some questions

[Gluster-users] I'm new to Gluster, and have some questions