> To what extent, is Gluster a good choice for the "many small files
scenario",
> as opposed to HDFS? Last I checked, hdfs would consume humongous memory
> resources if the cluster has many small files, given its architecture.
There
> are some hackish solutions on top HDFS for the case of many small files
> rather than huge files, but it would be nice to find a file system that
> matches that scenario well as is. So I wonder how would Gluster do when
> files are typically small.
We're not as bad as HDFS, but it's still not what I'd call a good
scenario for us. While we have good space efficiency for small files,
and we don't have a single-metadata-server SPOF either, the price we pay
is a hit to our performance for creates (and renames). There are
several efforts under way to improve this, but there's only so much we
can do when directory contents must be consistent across the volume
despite being spread across many bricks (or replica sets). More details
on those efforts are here.
http://www.gluster.org/community/documentation/index.php/Features/Feature_Smallfile_Perf