"Small files" is sort of a misconception. Initial file ops include a
small amount of overhead, with a lookup, the filename is hashed, the dht
subvolume is selected and the request is sent to that subvolume. If it's a
replica, the request is sent to each replica in that subvolume set (usually 2).
If it is a replica, all the replicas have to respond. If one or more have
pending flags or there's an attribute mismatch, either some self heal action
has to take place, or a split-brain is determined. If the file doesn't exist
on that subvolume, the same must be done to all the subvolumes. If the file is
found, a link file is made on the expected dht subvolume pointing to the place
we found the file. This will make finding it faster the next time. Once the file
is found and is determined to be clean, the file system can move on to the next
file operation.
PHP applications, specifically, normally have a lot of small files that are
opened for every page query so per-page, that overhead adds up. PHP also queries
a lot of files that just don't exist. Your single page might query 200 files
that just aren't there. They're in a different portion of the search
path, or they're a plugin that's not used, etc.
NFS mitigates that affect by using FScache in the kernel. It stores directories
and stats, preventing the call to the actual filesystem. This also means, of
course, that the image that was just uploaded through a different server
isn't going to exist on this one until the cache times out. Stale data in a
multi-client system is going to have to be expected in a cached client.
Jeff Darcy created a test translator that caches negative lookups which he said
also mitigated the PHP problem pretty nicely.
If you have control over your app, things like absolute pathing for PHP or
leaving file descriptors open can also avoid overhead. Also, optimizing the
number of times you open a file or the number of files to open can help.
So "small files" refers to the percent of total file op time
that's spent on overhead vs actual data retrieval.
Chandan Kumar <chandank.kumar at gmail.com> wrote:
>Hello All,
>
>I am new to gluster and evaluating it for my production environment. After
>reading some blogs and googling I learned that NFS mount at clients give
>better read performance for small files and the glusterfs/FUSE mount gives
>better for large write operations.
>
>Now my questions are
>
>1) What do we mean by small files? 1KB/1MB/1GB?
>2) If I am using NFS mount at the client I am most likely loosing the high
>availability feature of gluster. unlike fuse mount where if primary goes
>down I don't need to worry about availability.
>
>Basically my production environment will mostly have read operations of
>files ranging from 400KB to 5MB and they will be concurrently read by
>different threads.
>
>Thanks,
>Chandan
>
>_______________________________________________
>Gluster-users mailing list
>Gluster-users at gluster.org
>http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20120920/861a8993/attachment.html>