thr3ads.net - Gluster users - [Gluster-users] Small files [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Matan Safriel

2015-Jan-29 20:30 UTC

[Gluster-users] Small files

Hi Liam,

Thanks for the comprehensive reply (!)
How many nodes do you safely replicate to with ZFS?
I don't think seek time is much of a concern with SSD by the way, so it
does seem that glusterfs is much better for the small files scenario than
HDFS, which as you say is very different in key aspects, and couldn't quite
follow why rebalancing is slow or slower than in the case of HDFS actually,
unless you just meant that HDFS works at a large block level and no more.

Perhaps you'd care to comment ;)

Matan

On Thu, Jan 29, 2015 at 9:15 PM, Liam Slusser <lslusser at gmail.com>
wrote:
> Matan - I'll do my best to take a shot at answering this...
>
> They're completely different technologies.  HDFS is not posix compliant
> and is not a "mountable" filesystem while Gluster is.
>
> In HDFS land, every file, directory and block in HDFS is represented as an
> object in the namenode?s memory, each of which occupies 150 bytes.  So 10
> million files would each up about 3 gigs of memory.  Furthermore was
> designed for streaming large files - the default blocksize in HDFS is 64MB.
>
> Gluster doesn't have a central namenode, so having millions of files
> doesn't put a tax on it in the same way.  But, again, small files
causes
> lots of small seeks to handle the replication tasks/checks and generally
> isn't very efficient.  So don't expect blazing performance... 
Doing
> rebalancing and rebuilding of Gluster bricks can be extremely painful since
> Gluster isn't a block level filesystem - so it will have to read each
file
> one at a time.
>
> If you want to use HDFS and don't need a mountable filesystem have a
look
> at HBASE.
>
> We tacked the small files problem by using a different technology.  I have
> an image store of about 120 million+ small-file images, I needed a
> "mountable" filesystem which was posix compliant and ended up
doing a ZFS
> setup - using the built in replication to create a few identical copies on
> different servers for both load balancing and reliability.  So we update
> one server and than have a few read-only copies serving the data.  Changes
> get replicated, at a block level, every few minutes.
>
> thanks,
> liam
>
>
> On Thu, Jan 29, 2015 at 4:29 AM, Matan Safriel <dev.matan at
gmail.com>
> wrote:
>
>> Hi,
>>>
>>> Is glusterfs much better than hdfs for the many small files
scenario?
>>>
>>> Thanks,
>>> Matan
>>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150129/e1ce002d/attachment.html>

John_Salinas at Dell.com

2015-Jan-29 20:54 UTC

head link

[Gluster-users] Small files

I suppose some of it may depend on what you consider a small file and how many
of them there are and the operation read/write/sequential/etc as well as the
performance expectations are.

I had looked at zfs replication also but redhat support was a problem.

Does anyone know if a beta is available for:
http://www.gluster.org/community/documentation/index.php/Features/Feature_Smallfile_Perf

Are there any benchmarks published comparing nfs client mount vs. gluster fuse
vs. boost (for apache)?   I know this is very current but
https://s3.amazonaws.com/aws001/guided_trek/Performance_in_a_Gluster_Systemv6F.pdf
pages
19-20<https://s3.amazonaws.com/aws001/guided_trek/Performance_in_a_Gluster_Systemv6F.pdf%20pages%2019-20>
has some data for small files ? from that doc ?As can be seen in Figure 8 below,
Gluster delivers good single storage node performance for a variety of small
file operations. Generally speaking, Gluster Native FUSE will deliver better
small file performance than Gluster NFS, although Gluster NFS is often better
for very small block sizes. Perhaps most important, IOPS performance in Gluster
scales out just as throughput performance scales out.?  This is from 2013 and
has a few suggestions to try:
https://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf.
If anyone has newer information that would be appreciated.

-john



From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at
gluster.org] On Behalf Of Matan Safriel
Sent: Thursday, January 29, 2015 2:31 PM
To: Liam Slusser
Cc: gluster-users at gluster.org
Subject: Re: [Gluster-users] Small files

Hi Liam,

Thanks for the comprehensive reply (!)
How many nodes do you safely replicate to with ZFS?
I don't think seek time is much of a concern with SSD by the way, so it does
seem that glusterfs is much better for the small files scenario than HDFS, which
as you say is very different in key aspects, and couldn't quite follow why
rebalancing is slow or slower than in the case of HDFS actually, unless you just
meant that HDFS works at a large block level and no more.

Perhaps you'd care to comment ;)

Matan

On Thu, Jan 29, 2015 at 9:15 PM, Liam Slusser <lslusser at
gmail.com<mailto:lslusser at gmail.com>> wrote:
Matan - I'll do my best to take a shot at answering this...

They're completely different technologies.  HDFS is not posix compliant and
is not a "mountable" filesystem while Gluster is.

In HDFS land, every file, directory and block in HDFS is represented as an
object in the namenode?s memory, each of which occupies 150 bytes.  So 10
million files would each up about 3 gigs of memory.  Furthermore was designed
for streaming large files - the default blocksize in HDFS is 64MB.

Gluster doesn't have a central namenode, so having millions of files
doesn't put a tax on it in the same way.  But, again, small files causes
lots of small seeks to handle the replication tasks/checks and generally
isn't very efficient.  So don't expect blazing performance...  Doing
rebalancing and rebuilding of Gluster bricks can be extremely painful since
Gluster isn't a block level filesystem - so it will have to read each file
one at a time.

If you want to use HDFS and don't need a mountable filesystem have a look at
HBASE.

We tacked the small files problem by using a different technology.  I have an
image store of about 120 million+ small-file images, I needed a
"mountable" filesystem which was posix compliant and ended up doing a
ZFS setup - using the built in replication to create a few identical copies on
different servers for both load balancing and reliability.  So we update one
server and than have a few read-only copies serving the data.  Changes get
replicated, at a block level, every few minutes.

thanks,
liam


On Thu, Jan 29, 2015 at 4:29 AM, Matan Safriel <dev.matan at
gmail.com<mailto:dev.matan at gmail.com>> wrote:
Hi,

Is glusterfs much better than hdfs for the many small files scenario?

Thanks,
Matan


_______________________________________________
Gluster-users mailing list
Gluster-users at gluster.org<mailto:Gluster-users at gluster.org>
http://www.gluster.org/mailman/listinfo/gluster-users


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150129/d0070afd/attachment.html>

Liam Slusser

2015-Jan-30 02:28 UTC

head link

[Gluster-users] Small files

Matan -

We replicate to two nodes.  But since a zfs send | zfs recv communicates
one-way, I'd think you could do as many as you want.  It just might take a
little bit longer - although you should be able to run multiple at a time
as long as you had enough bandwidth over the network.  Ours are connected
via a dedicated 10gigabit network and see around 4-5gbit/sec on a large
commit.  How long the replication job takes depends on how much is changed
between the two snapshots.

Even though the seek time with a SSD is quick, you'll still get far greater
throughput in sequential read/writing vs small random accesses.

You can test it yourself.  Create a directory with 100 64MB files and
another directory with 64,000 100K files.  Now copy it from one place to
another and see for yourself which is faster.  Sequential reading always
wins.  And this is true with both Gluster and HDFS.

In HDFS small files exacerbates the problem because you need to contact the
NameNode to get the block information and then contact the DataNode to get
the block.  Think of it like this.  Reading 1000 64KB files in HDFS means
1000 requests to the NameNode and 1000 requests to the datanodes while
reading 1 64MB file is one trip to the NameNode and one trip the the
Datanode to get the same amount of data.

You can read more about this issue here:
http://blog.cloudera.com/blog/2009/02/the-small-files-problem/

thanks,
liam

On Thu, Jan 29, 2015 at 12:30 PM, Matan Safriel <dev.matan at gmail.com>
wrote:
> Hi Liam,
>
> Thanks for the comprehensive reply (!)
> How many nodes do you safely replicate to with ZFS?
> I don't think seek time is much of a concern with SSD by the way, so it
> does seem that glusterfs is much better for the small files scenario than
> HDFS, which as you say is very different in key aspects, and couldn't
quite
> follow why rebalancing is slow or slower than in the case of HDFS actually,
> unless you just meant that HDFS works at a large block level and no more.
>
> Perhaps you'd care to comment ;)
>
> Matan
>
> On Thu, Jan 29, 2015 at 9:15 PM, Liam Slusser <lslusser at gmail.com>
wrote:
>
>> Matan - I'll do my best to take a shot at answering this...
>>
>> They're completely different technologies.  HDFS is not posix
compliant
>> and is not a "mountable" filesystem while Gluster is.
>>
>> In HDFS land, every file, directory and block in HDFS is represented as
>> an object in the namenode?s memory, each of which occupies 150 bytes. 
So
>> 10 million files would each up about 3 gigs of memory.  Furthermore was
>> designed for streaming large files - the default blocksize in HDFS is
64MB.
>>
>> Gluster doesn't have a central namenode, so having millions of
files
>> doesn't put a tax on it in the same way.  But, again, small files
causes
>> lots of small seeks to handle the replication tasks/checks and
generally
>> isn't very efficient.  So don't expect blazing performance... 
Doing
>> rebalancing and rebuilding of Gluster bricks can be extremely painful
since
>> Gluster isn't a block level filesystem - so it will have to read
each file
>> one at a time.
>>
>> If you want to use HDFS and don't need a mountable filesystem have
a look
>> at HBASE.
>>
>> We tacked the small files problem by using a different technology.  I
>> have an image store of about 120 million+ small-file images, I needed a
>> "mountable" filesystem which was posix compliant and ended up
doing a ZFS
>> setup - using the built in replication to create a few identical copies
on
>> different servers for both load balancing and reliability.  So we
update
>> one server and than have a few read-only copies serving the data. 
Changes
>> get replicated, at a block level, every few minutes.
>>
>> thanks,
>> liam
>>
>>
>> On Thu, Jan 29, 2015 at 4:29 AM, Matan Safriel <dev.matan at
gmail.com>
>> wrote:
>>
>>> Hi,
>>>>
>>>> Is glusterfs much better than hdfs for the many small files
scenario?
>>>>
>>>> Thanks,
>>>> Matan
>>>>
>>>
>>>
>>> _______________________________________________
>>> Gluster-users mailing list
>>> Gluster-users at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150129/7e9b979b/attachment.html>

Vijay Bellur

2015-Feb-01 22:31 UTC

head link

[Gluster-users] Small files

On 01/29/2015 09:54 PM, John_Salinas at Dell.com wrote:> I suppose some of it may depend on what you consider a small file and
> how many of them there are and the operation read/write/sequential/etc
> as well as the performance expectations are.
>
Agree, performance is subjective and expectation,behavior varies across 
deployments.  It would be good to understand from various deployments 
the operations that seem to be slower for small files. In my experience 
and tests, performance problems seem to be associated more with 
create/write & readdir operations. Are there other operations that have 
been slow for you?
> I had looked at zfs replication also but redhat support was a problem.
>
> Does anyone know if a beta is available for:
>
http://www.gluster.org/community/documentation/index.php/Features/Feature_Smallfile_Perf
>
Some enhancements listed are going to land in 3.7. Once 3.7 feature 
freeze happens in February, you should be able to try out the 
improvements. What enhancements from the list would you consider 
important for your use case?
>
> Are there any benchmarks published comparing nfs client mount vs.
> gluster fuse vs. boost (for apache)?   I know this is very current but
>
https://s3.amazonaws.com/aws001/guided_trek/Performance_in_a_Gluster_Systemv6F.pdf
> pages 19-20
>
<https://s3.amazonaws.com/aws001/guided_trek/Performance_in_a_Gluster_Systemv6F.pdf%20pages%2019-20>
> has some data for small files ? from that doc ?As can be seen in Figure
> 8 below, Gluster delivers good single storage node performance for a
> variety of small file operations. Generally speaking, Gluster Native
> FUSE will deliver better small file performance than Gluster NFS,
> although Gluster NFS is often better for very small block sizes. Perhaps
> most important, IOPS performance in Gluster scales out just as
> throughput performance scales out.?  This is from 2013 and has a few
> suggestions to try:
>
https://rhsummit.files.wordpress.com/2013/07/england_th_0450_rhs_perf_practices-4_neependra.pdf.
There is a similar presentation from the 2014 Red Hat Summit at [1].

Regards,
Vijay

[1] 
https://rhsummit.files.wordpress.com/2014/04/bengland_h_1100_rhs_performance.pdf

Gluster users - Feb 2015 - Small files

[Gluster-users] Small files

[Gluster-users] Small files

[Gluster-users] Small files

[Gluster-users] Small files