Hi All, I complained about the low file creation rate with the glusterfs on my cluster weeks ago and Avati suggested I started with a small number of nodes. I finally get sometime to seriously benchmark glusterfs with Bonnie++ today and the results confirms that glusterfs is indeed slow in terms of file creating. My application is to store a large number of ~200KB image files. I use the following bonnie++ command for evaluation (create 10K files of 200KiB each scattered under 100 directories): bonnie++ -d . -s 0 -n 10:200000:200000:100 Since sequential I/O is not that interesting to me, I only keep the random I/O results. My hardware configuration is 2xquadcore Xeon E5430 2.66GHz, 16GB memory, 4 x Seagate 1500GiB 7200RPM hard drive. The machines are connected with gigabit ethernet. I ran several GlusterFS configurations, each named as N-R-T, where N is the number of replicated volumes aggregated, R is the number of replications and T is number of server side I/O thread. I use one machine to serve one volume so there are NxR servers and one separate client running for each experiment. On the client side, the server volumes are first replicated and then aggregated -- even with 1-1-2 configuration, the single volume is wrapped by a replicate and a distribute translator. To show the overhead of those translators, I also run a "simple" configuration which is 1-1-2 without the extra replicate & distribute translators, and a "local" configuration which is "simple" with client & server running on the same machine. These configurations are compared to "nfs" and "nfs-local", which is NFS with server and client on the same machine. The GlusterFS volume file templates are attached to the email. The result is at http://www.cs.princeton.edu/~wdong/gluster/summary.gif<http://www.cs.princeton.edu/%7Ewdong/gluster/summary.gif>. The bars/numbers shown are operations/second, so the larger the better. Following are the messages shown by the figure: 1. GlusterFS is doing a exceptionally good job on deleting files, but creates and reads files much slower than both NFS. 2. At least for one node server configuration, network doesn't affects the file creation rate and does affects file read rate. 3. The extra dummy replicate & distribute translators lowers file creation rate by almost half. 4. Replication doesn't hurt performance a lot. 5. I'm running only single-threaded benchmark, so it's hard to say about scalability, but adding more servers does helps a little bit even in single-threaded setting. Note that my results are not really that different from http://gluster.com/community/documentation/index.php/GlusterFS_2.0_I/O_Benchmark_Results, where the single node configuration file create rate is about 30/second. I see no reason why GlusterFS has to be that slower than NFS in file creation in single node configuration. I'm wondering if someone here can help me figure out what's wrong in my configuration or what's wrong in the GlusterFS implementation. - Wei Server volume: volume posix type storage/posix option directory /state/partition1/wdong/gluster end-volume volume lock type features/locks subvolumes posix end-volume volume brick type performance/io-threads option thread-count 2 subvolumes lock end-volume volume server type protocol/server option transport-type tcp option auth.addr.brick.allow 192.168.99.* option transport.socket.listen-port 6999 subvolumes brick end-volume Client volume volume brick-0-0 type protocol/client option transport-type tcp option remote-host c8-0-0 option remote-port 6999 option remote-subvolume brick end-volume volume brick-0-1 ... volume rep-0 type cluster/replicate subvolumes brick-0-0 brick-0-1 ... ... volume union type cluster/distribute subvolumes rep-0 rep-1 rep-2 rep-3 rep-4 rep-5 rep-6 rep-7 end-volume volume client type performance/write-behind option cache-size 32MB option flush-behind on subvolumes union end-volume For those who are interested enough to see the real configuration files, I have all the configuration files and server/client logs uploaded to http://www.cs.princeton.edu/~wdong/gluster/run.tar.gz<http://www.cs.princeton.edu/%7Ewdong/gluster/run.tar.gz>.
The glusterfs version I'm using is 2.0.6. - Wei On Thu, Sep 10, 2009 at 2:05 PM, Wei Dong <wdong.pku at gmail.com> wrote:> Hi All, > > I complained about the low file creation rate with the glusterfs on my > cluster weeks ago and Avati suggested I started with a small number of > nodes. I finally get sometime to seriously benchmark glusterfs with > Bonnie++ today and the results confirms that glusterfs is indeed slow in > terms of file creating. My application is to store a large number of ~200KB > image files. I use the following bonnie++ command for evaluation (create > 10K files of 200KiB each scattered under 100 directories): > > bonnie++ -d . -s 0 -n 10:200000:200000:100 > > Since sequential I/O is not that interesting to me, I only keep the random > I/O results. > > My hardware configuration is 2xquadcore Xeon E5430 2.66GHz, 16GB memory, 4 > x Seagate 1500GiB 7200RPM hard drive. The machines are connected with > gigabit ethernet. > > I ran several GlusterFS configurations, each named as N-R-T, where N is the > number of replicated volumes aggregated, R is the number of replications and > T is number of server side I/O thread. I use one machine to serve one > volume so there are NxR servers and one separate client running for each > experiment. On the client side, the server volumes are first replicated and > then aggregated -- even with 1-1-2 configuration, the single volume is > wrapped by a replicate and a distribute translator. To show the overhead of > those translators, I also run a "simple" configuration which is 1-1-2 > without the extra replicate & distribute translators, and a "local" > configuration which is "simple" with client & server running on the same > machine. These configurations are compared to "nfs" and "nfs-local", which > is NFS with server and client on the same machine. The GlusterFS volume > file templates are attached to the email. > > The result is at http://www.cs.princeton.edu/~wdong/gluster/summary.gif<http://www.cs.princeton.edu/%7Ewdong/gluster/summary.gif>. The bars/numbers shown are operations/second, so the larger the better. > > Following are the messages shown by the figure: > 1. GlusterFS is doing a exceptionally good job on deleting files, but > creates and reads files much slower than both NFS. > 2. At least for one node server configuration, network doesn't affects the > file creation rate and does affects file read rate. > 3. The extra dummy replicate & distribute translators lowers file creation > rate by almost half. 4. Replication doesn't hurt performance a lot. > 5. I'm running only single-threaded benchmark, so it's hard to say about > scalability, but adding more servers does helps a little bit even in > single-threaded setting. > > Note that my results are not really that different from > http://gluster.com/community/documentation/index.php/GlusterFS_2.0_I/O_Benchmark_Results, > where the single node configuration file create rate is about 30/second. > > > I see no reason why GlusterFS has to be that slower than NFS in file > creation in single node configuration. I'm wondering if someone here can > help me figure out what's wrong in my configuration or what's wrong in the > GlusterFS implementation. > > - Wei > > Server volume: > > volume posix > type storage/posix > option directory /state/partition1/wdong/gluster > end-volume > > volume lock > type features/locks > subvolumes posix > end-volume > > volume brick > type performance/io-threads > option thread-count 2 > subvolumes lock > end-volume > > volume server > type protocol/server > option transport-type tcp > option auth.addr.brick.allow 192.168.99.* > option transport.socket.listen-port 6999 > subvolumes brick > end-volume > > > Client volume > > volume brick-0-0 > type protocol/client > option transport-type tcp > option remote-host c8-0-0 > option remote-port 6999 > option remote-subvolume brick > end-volume > > volume brick-0-1 ... > > volume rep-0 > type cluster/replicate > subvolumes brick-0-0 brick-0-1 ... > > ... > volume union > type cluster/distribute > subvolumes rep-0 rep-1 rep-2 rep-3 rep-4 rep-5 rep-6 rep-7 > end-volume > > volume client > type performance/write-behind > option cache-size 32MB > option flush-behind on > subvolumes union > end-volume > > > For those who are interested enough to see the real configuration files, I > have all the configuration files and server/client logs uploaded to > http://www.cs.princeton.edu/~wdong/gluster/run.tar.gz<http://www.cs.princeton.edu/%7Ewdong/gluster/run.tar.gz>. > >
Wei Dong
2009-Sep-11 13:44 UTC
[Gluster-users] very low file creation rate with glusterfs -- result updates
I think it is fuse that causes the slowness. I ran all experiments with booster enabled and here's the new figure: http://www.cs.princeton.edu/~wdong/gluster/summary-booster.gif . The numbers are MUCH better than NFS in most cases except for the local setting, which is not practically interesting. The interesting thing is that all of a sudden, the deleting rate drop by 4-10 times -- though I don't really care about file deletion. I must say that I'm totally satisfied by the results. - Wei Wei Dong wrote:> Hi All, > > I complained about the low file creation rate with the glusterfs on my > cluster weeks ago and Avati suggested I started with a small number of > nodes. I finally get sometime to seriously benchmark glusterfs with > Bonnie++ today and the results confirms that glusterfs is indeed slow > in terms of file creating. My application is to store a large number > of ~200KB image files. I use the following bonnie++ command for > evaluation (create 10K files of 200KiB each scattered under 100 > directories): > > bonnie++ -d . -s 0 -n 10:200000:200000:100 > > Since sequential I/O is not that interesting to me, I only keep the > random I/O results. > > My hardware configuration is 2xquadcore Xeon E5430 2.66GHz, 16GB > memory, 4 x Seagate 1500GiB 7200RPM hard drive. The machines are > connected with gigabit ethernet. > > I ran several GlusterFS configurations, each named as N-R-T, where N > is the number of replicated volumes aggregated, R is the number of > replications and T is number of server side I/O thread. I use one > machine to serve one volume so there are NxR servers and one separate > client running for each experiment. On the client side, the server > volumes are first replicated and then aggregated -- even with 1-1-2 > configuration, the single volume is wrapped by a replicate and a > distribute translator. To show the overhead of those translators, I > also run a "simple" configuration which is 1-1-2 without the extra > replicate & distribute translators, and a "local" configuration which > is "simple" with client & server running on the same machine. These > configurations are compared to "nfs" and "nfs-local", which is NFS > with server and client on the same machine. The GlusterFS volume file > templates are attached to the email. > > The result is at > http://www.cs.princeton.edu/~wdong/gluster/summary.gif . The > bars/numbers shown are operations/second, so the larger the better. > > Following are the messages shown by the figure: > 1. GlusterFS is doing a exceptionally good job on deleting files, but > creates and reads files much slower than both NFS. > 2. At least for one node server configuration, network doesn't > affects the file creation rate and does affects file read rate. > 3. The extra dummy replicate & distribute translators lowers file > creation rate by almost half. 4. Replication doesn't hurt performance > a lot. > 5. I'm running only single-threaded benchmark, so it's hard to say > about scalability, but adding more servers does helps a little bit > even in single-threaded setting. > > Note that my results are not really that different from > http://gluster.com/community/documentation/index.php/GlusterFS_2.0_I/O_Benchmark_Results, > where the single node configuration file create rate is about 30/second. > > I see no reason why GlusterFS has to be that slower than NFS in file > creation in single node configuration. I'm wondering if someone here > can help me figure out what's wrong in my configuration or what's > wrong in the GlusterFS implementation. > > - Wei > > Server volume: > > volume posix > type storage/posix > option directory /state/partition1/wdong/gluster > end-volume > > volume lock > type features/locks > subvolumes posix > end-volume > > volume brick > type performance/io-threads > option thread-count 2 > subvolumes lock > end-volume > > volume server > type protocol/server > option transport-type tcp > option auth.addr.brick.allow 192.168.99.* > option transport.socket.listen-port 6999 > subvolumes brick > end-volume > > > Client volume > > volume brick-0-0 > type protocol/client > option transport-type tcp > option remote-host c8-0-0 > option remote-port 6999 > option remote-subvolume brick > end-volume > > volume brick-0-1 ... > > volume rep-0 > type cluster/replicate > subvolumes brick-0-0 brick-0-1 ... > > ... > volume union > type cluster/distribute > subvolumes rep-0 rep-1 rep-2 rep-3 rep-4 rep-5 rep-6 rep-7 > end-volume > > volume client > type performance/write-behind > option cache-size 32MB > option flush-behind on > subvolumes union > end-volume > > > For those who are interested enough to see the real configuration > files, I have all the configuration files and server/client logs > uploaded to http://www.cs.princeton.edu/~wdong/gluster/run.tar.gz . >