Matan - I'll do my best to take a shot at answering this...
They're completely different technologies. HDFS is not posix compliant and
is not a "mountable" filesystem while Gluster is.
In HDFS land, every file, directory and block in HDFS is represented as an
object in the namenode?s memory, each of which occupies 150 bytes. So 10
million files would each up about 3 gigs of memory. Furthermore was
designed for streaming large files - the default blocksize in HDFS is 64MB.
Gluster doesn't have a central namenode, so having millions of files
doesn't put a tax on it in the same way. But, again, small files causes
lots of small seeks to handle the replication tasks/checks and generally
isn't very efficient. So don't expect blazing performance... Doing
rebalancing and rebuilding of Gluster bricks can be extremely painful since
Gluster isn't a block level filesystem - so it will have to read each file
one at a time.
If you want to use HDFS and don't need a mountable filesystem have a look
at HBASE.
We tacked the small files problem by using a different technology. I have
an image store of about 120 million+ small-file images, I needed a
"mountable" filesystem which was posix compliant and ended up doing a
ZFS
setup - using the built in replication to create a few identical copies on
different servers for both load balancing and reliability. So we update
one server and than have a few read-only copies serving the data. Changes
get replicated, at a block level, every few minutes.
thanks,
liam
On Thu, Jan 29, 2015 at 4:29 AM, Matan Safriel <dev.matan at gmail.com>
wrote:
> Hi,
>>
>> Is glusterfs much better than hdfs for the many small files scenario?
>>
>> Thanks,
>> Matan
>>
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20150129/4627bc6e/attachment.html>