thr3ads.net - Gluster users - [Gluster-users] Getting past the basic setup [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Jorge Garcia

2015-Jan-27 18:40 UTC

[Gluster-users] Getting past the basic setup

Hi Ben,

Sorry I didn't reply earlier, one week of much needed vacation got in 
the way...

Anyway, here's what we're trying to do with glusterfs for now. Not much,
but I figured it would be a test:

Supporting a group that uses genomics data, we have a WHOLE bunch of 
data that our users keep piling on. We figured we needed a place to 
backup all this data (we're talking 100s of TBs, quickly reaching PBs). 
In the past, we have been doing some quick backups for disaster recovery 
by using rsync to several different machines, separating the data by 
[project/group/user] depending on how much data they had. This becomes a 
problem because some machines run out of space while others have plenty 
of room. So we figured: "What if we make it all one big glusterfs 
filesystem, then we can add storage as we need it for everybody, remove 
old systems when they become obsolete, and keep it all under one 
namespace". We don't need blazing speed, as this is all just disaster 
recovery, backups run about every week. We figured the first backup 
would take a while, but after that it would just be incrementals using 
rsync, so it would be OK. The data is mixed, with some people keeping 
some really large files, some people keeping millions of small files, 
and some people somewhere in between.

We started our testing with a 2 node glusterfs system. Then we compared 
a copy of several TBs to that system vs. copying to an individual 
machine. The glusterfs copy was almost 2 orders of magnitude slower. 
Which led me to believe we had done something wrong, or we needed to do 
some performance tuning. That's when I found out that there's plenty of 
information about the basic setup of glusterfs, but not much when you're 
trying to get beyond that.

Thanks for your help, we will look at the pointers you gave us, and will 
probably report on our results as we get them.

Jorge

On 01/16/15 12:57, Ben Turner wrote:> ----- Original Message -----
>> From: "Jorge Garcia" <jgarcia at soe.ucsc.edu>
>> To: gluster-users at gluster.org
>> Sent: Thursday, January 15, 2015 1:24:35 PM
>> Subject: [Gluster-users] Getting past the basic setup
>>
>>
>> We're looking at using glusterfs for some of our storage needs. I
have
>> read the "Quick start", was able to create a basic
filesystem, and that
>> all worked. But wow, was it slow! I'm not sure if it was due to our
very
>> simple setup, or if that's just the way it is. So, we're trying
to look
>> deeper into options to improve performance, etc. In particular, I'm
>> interested in the NUFA scheduler. But after hours of googling around
for
>> more information, I haven't gotten anywhere. There's a page
with
>> "translator tutorials", but all the links are dead. Jeff
Darcy seems to
>> be working at Red Hat these days, so maybe all his open-source stuff
got
>> removed. I haven't found anything about how to even start setting
up a
>> NUFA scheduled system. Any pointers of where to even start?
>
> Hi Jorge.  Here are some of my favorite gluster tuning DOCs:
>
>
https://rhsummit.files.wordpress.com/2014/04/bengland_h_1100_rhs_performance.pdf
>
http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf
> http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf
>
> They are all pretty much the same just updated as new features were
implemented.  I usually think of gluster perf for sequential workloads like:
>
> writes = 1/2 * NIC bandwidth(when using replica 2) - 20%(with a fast enough
back end to service this)
>
> .5 * 1250 - 250 = ~500 MB / sec
>
> reads = NIC bandwidth - 40%
>
> 1250 * .7 = ~750 MB / sec
>
> If you aren't seeing between 400-500 MB / sec sequential writes and
600-750 MB / sec sequential reads on 10G NICs with fast enough back end to
service this then be sure to tune what is suggested in that DOC.  What is your
HW like?  What are your performance requirements?  Just to note, glusterFS
performance really starts to degrade when writing in under 64KB block sizes. 
What size of files and what record / block size are you writing in?
>
> -b
>
>> Any help would be extremely appreciated!
>>
>> Thanks,
>>
>> Jorge
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>

Ben Turner

2015-Jan-27 19:29 UTC

head link

[Gluster-users] Getting past the basic setup

No problem!  With glusterfs replication is done client side by writing to
multiple bricks at the same time, this means that you bandwidth will be cut into
whatever number of replicas you chose.  So in a replica 2 you bandwith is cut in
half, replica 3 into 1/3, and so on.  Be sure to remember this when looking at
the numbers.  Also writing in really small blocks / file sizes can slow things
down quite a bit, I try to keep writes in at least 64KB blocks(1024k is the
sweet spot in my testing).  I think your use case is a good fit for us and I
would love to see ya'll using gluster in production.  The only concern I
have is smallfile performance and we are actively working to address that.  I
suggest trying out the multi threaded epoll changes that are coming down if
smallfile perf becomes a focus of yours.

Also, I did a script that will help you out with the configuration / tuning, it
is developed specifically for RHEL but should work with centos as well or other
RPM based distros as well.  If you want to tear down your env and easily rebuild
try the following:

Use kernel 2.6.32-504.3.2.el6 or later if using thinp - there were issues with
earlier versions

Before this script is run, the system should be wiped. To do so, on each gluster
server run(be sure to unmount the clients first):

# service glusterd stop; if [ $? -ne 0 ]; then pkill -9 glusterd; fi;  umount
<my brick - /rhs/brick1>; vgremove <my VG - RHS_vg1> -y --force
--force --force; pvremove <My PV - /dev/sdb> --force --force
# pgrep gluster; if [ $? -eq 0 ]; then pgrep gluster | xargs kill -9; fi 
# for file in /var/lib/glusterd/*; do if ! echo $file | grep 'hooks'
>/dev/null 2>&1;then rm -rf $file; fi; done; rm -rf
"/bricks/*"

***NOTE*** REPLACE < items > with your actual path/device ***

The prerequisites for the installer script are:

* Network needs to be installed and configured.
* DNS (both ways) needs to be operational if using Host Names
* At least 2 block devices need to be present on servers. First should be the
O/S device, all other devices will be used for brick storage.

Edit the installer script to select/define the correct RAID being used. For
example, with RAID6 (the default for redhat sotrage) we have:

stripesize=128k
stripe_elements=10   # number of data disks, not counting spares
dataalign=1280k

RAID 6 works better with large (disk image) files.
RAID 10 works better with many smaller files.

Running the command with no options will display the help information.
Usage:  rhs-system-init.sh [-h] [-u virtual|object]

General:
  -u <workload>   virtual - RHS is used for storing virtual machine
images.
                  object  - Object Access (HTTP) is the primary method to
                            access the RHS volume.
                  The workload helps decide what performance tuning profile
                  to apply and other customizations.
                  By default, the  general purpose workload is assumed.
  -d <block dev>  Specify a preconfigured block device to use for a brick.
                  This must be the full path and it should already have an
                  XFS file system.  This is used when you want to manually
                  tune / configure your brick.  Example:
                  $ME -d /dev/myLV/myVG
  -s NODES=" "    Specify a space seperated list of hostnames that
will be used
                  as glusterfs servers.  This will configure a volume with the 
                  information provided.
  -c CLIENTS=" "  Specify a space seperated list of hostnames that
will be used
                  as glusterfs clients.
  -t              Number of replicas.  Use 0 for a pure distribute volume.
  -o              Configure Samba on the servers and clients.
  -n              Dry run to show what devices will be used for brick creation.
  -r              Skip RHN Registration.
  -h              Display this help.

As an example:

# sh rhs-system-init.sh -s "node1.example.com node2.example.com
node3.example.com" -c "client1.example.com" -r -t 2

Let me know if you find the script useful, if so I'll look at doing
something centos / community specific so that other can use it.

-b

----- Original Message -----> From: "Jorge Garcia" <jgarcia at soe.ucsc.edu>
> To: "Ben Turner" <bturner at redhat.com>
> Cc: gluster-users at gluster.org
> Sent: Tuesday, January 27, 2015 1:40:19 PM
> Subject: Re: [Gluster-users] Getting past the basic setup
> 
> Hi Ben,
> 
> Sorry I didn't reply earlier, one week of much needed vacation got in
> the way...
> 
> Anyway, here's what we're trying to do with glusterfs for now. Not
much,
> but I figured it would be a test:
> 
> Supporting a group that uses genomics data, we have a WHOLE bunch of
> data that our users keep piling on. We figured we needed a place to
> backup all this data (we're talking 100s of TBs, quickly reaching PBs).
> In the past, we have been doing some quick backups for disaster recovery
> by using rsync to several different machines, separating the data by
> [project/group/user] depending on how much data they had. This becomes a
> problem because some machines run out of space while others have plenty
> of room. So we figured: "What if we make it all one big glusterfs
> filesystem, then we can add storage as we need it for everybody, remove
> old systems when they become obsolete, and keep it all under one
> namespace". We don't need blazing speed, as this is all just
disaster
> recovery, backups run about every week. We figured the first backup
> would take a while, but after that it would just be incrementals using
> rsync, so it would be OK. The data is mixed, with some people keeping
> some really large files, some people keeping millions of small files,
> and some people somewhere in between.
> 
> We started our testing with a 2 node glusterfs system. Then we compared
> a copy of several TBs to that system vs. copying to an individual
> machine. The glusterfs copy was almost 2 orders of magnitude slower.
> Which led me to believe we had done something wrong, or we needed to do
> some performance tuning. That's when I found out that there's
plenty of
> information about the basic setup of glusterfs, but not much when
you're
> trying to get beyond that.
> 
> Thanks for your help, we will look at the pointers you gave us, and will
> probably report on our results as we get them.
> 
> Jorge
> 
> On 01/16/15 12:57, Ben Turner wrote:
> > ----- Original Message -----
> >> From: "Jorge Garcia" <jgarcia at soe.ucsc.edu>
> >> To: gluster-users at gluster.org
> >> Sent: Thursday, January 15, 2015 1:24:35 PM
> >> Subject: [Gluster-users] Getting past the basic setup
> >>
> >>
> >> We're looking at using glusterfs for some of our storage
needs. I have
> >> read the "Quick start", was able to create a basic
filesystem, and that
> >> all worked. But wow, was it slow! I'm not sure if it was due
to our very
> >> simple setup, or if that's just the way it is. So, we're
trying to look
> >> deeper into options to improve performance, etc. In particular,
I'm
> >> interested in the NUFA scheduler. But after hours of googling
around for
> >> more information, I haven't gotten anywhere. There's a
page with
> >> "translator tutorials", but all the links are dead. Jeff
Darcy seems to
> >> be working at Red Hat these days, so maybe all his open-source
stuff got
> >> removed. I haven't found anything about how to even start
setting up a
> >> NUFA scheduled system. Any pointers of where to even start?
> >
> > Hi Jorge.  Here are some of my favorite gluster tuning DOCs:
> >
> >
https://rhsummit.files.wordpress.com/2014/04/bengland_h_1100_rhs_performance.pdf
> >
http://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf
> >
http://rhsummit.files.wordpress.com/2012/03/england-rhs-performance.pdf
> >
> > They are all pretty much the same just updated as new features were
> > implemented.  I usually think of gluster perf for sequential workloads
> > like:
> >
> > writes = 1/2 * NIC bandwidth(when using replica 2) - 20%(with a fast
enough
> > back end to service this)
> >
> > .5 * 1250 - 250 = ~500 MB / sec
> >
> > reads = NIC bandwidth - 40%
> >
> > 1250 * .7 = ~750 MB / sec
> >
> > If you aren't seeing between 400-500 MB / sec sequential writes
and 600-750
> > MB / sec sequential reads on 10G NICs with fast enough back end to
service
> > this then be sure to tune what is suggested in that DOC.  What is your
HW
> > like?  What are your performance requirements?  Just to note,
glusterFS
> > performance really starts to degrade when writing in under 64KB block
> > sizes.  What size of files and what record / block size are you
writing
> > in?
> >
> > -b
> >
> >> Any help would be extremely appreciated!
> >>
> >> Thanks,
> >>
> >> Jorge
> >>
> >>
> >> _______________________________________________
> >> Gluster-users mailing list
> >> Gluster-users at gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> >>
> -------------- next part --------------
A non-text attachment was scrubbed...
Name: rhs-system-init (4).sh
Type: application/x-shellscript
Size: 19674 bytes
Desc: not available
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150127/a0a22783/attachment-0001.bin>

Gluster users - Jan 2015 - Getting past the basic setup

[Gluster-users] Getting past the basic setup

[Gluster-users] Getting past the basic setup