Hi! I've deployed glusterfs on 2 nodes using replication mode and done some tests. Peroformance drop was significant and I've no idea if its possible to improve it or not. Volume was mounted on one of nodes using both FUSE and NFS glusterfs clients. 1.1G of small files are stored on volume. To make command shorter M symbol is used as mountpoint label. Native performance is performance of command issued over native FS on one of the bricks. No other activities on both node happened. Here is the results per command: dd if=/dev/zero of=M/tmp bs=1M count=16384 69.2 MB/se (Native) 69.2 MB/sec(FUSE) 52 MB/sec (NFS) dd if=/dev/zero of=M/tmp bs=1K count=163840000 88.1 MB/sec (Native) 1.1MB/sec (FUSE) 52.4 MB/sec (NFS) time tar cf - M | pv > /dev/null 15.8 MB/sec (native) 3.48MB/sec (FUSE) 254 Kb/sec (NFS) I use default configuration, no adjustments to /etc/gluster* where made. I use ext4 on bricks.
공용준(yongjoon kong)/Cloud Computing 사업담당
2011-Jan-14 04:53 UTC
[Gluster-users] very bad performance on small files
Maybe it's related with make the Metadata. And I have the same issue here. When I untar the kernel source( which has 34000 files) it took over 15 minute on replicated gluster but not more than 1 minute on local filesystem. Maybe this is because when gluster create metadata using DHT,it should contact brick to find the leftover space using 'du' Maybe it is to be a problem. But I don't know how to improve the performance Best Regards, Andrew Cloud Computing Business Team Andrew Kong ?Manager | andrew.kong at sk.com | T : +82-2-6400-4328? | M : +82-010-8776-5025 SK u-Tower, 25-1, Jeongja-dong, Bundang-gu, Seongnam-si, Gyeonggi-do, 463-844, Korea -----Original Message----- From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of Max Ivanov Sent: Friday, January 14, 2011 7:32 AM To: gluster-users Subject: [Gluster-users] very bad performance on small files Hi! I've deployed glusterfs on 2 nodes using replication mode and done some tests. Peroformance drop was significant and I've no idea if its possible to improve it or not. Volume was mounted on one of nodes using both FUSE and NFS glusterfs clients. 1.1G of small files are stored on volume. To make command shorter M symbol is used as mountpoint label. Native performance is performance of command issued over native FS on one of the bricks. No other activities on both node happened. Here is the results per command: dd if=/dev/zero of=M/tmp bs=1M count=16384 69.2 MB/se (Native) 69.2 MB/sec(FUSE) 52 MB/sec (NFS) dd if=/dev/zero of=M/tmp bs=1K count=163840000 88.1 MB/sec (Native) 1.1MB/sec (FUSE) 52.4 MB/sec (NFS) time tar cf - M | pv > /dev/null 15.8 MB/sec (native) 3.48MB/sec (FUSE) 254 Kb/sec (NFS) I use default configuration, no adjustments to /etc/gluster* where made. I use ext4 on bricks. _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Hello Gluster Gurus, I'm trying to find out what performance data you could get while trying eDiscovery searching application in a namespace with over 3 billins small files on GlusterFS... Thanks & Good w/e Henry PAN Sr. Data Storage Eng/Adm Iron Mountain 650-962-6184 (o) 650-930-6544 (c) Henry.pan at ironmountain.com -----Original Message----- From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of gluster-users-request at gluster.org Sent: Saturday, January 15, 2011 1:20 AM To: gluster-users at gluster.org Subject: Gluster-very bad performance on small files Send Gluster-users mailing list submissions to gluster-users at gluster.org To subscribe or unsubscribe via the World Wide Web, visit http://gluster.org/cgi-bin/mailman/listinfo/gluster-users or, via email, send a message with subject or body 'help' to gluster-users-request at gluster.org You can reach the person managing the list at gluster-users-owner at gluster.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Gluster-users digest..." Today's Topics: 1. Re: very bad performance on small files (Marcus Bointon) 2. Re: very bad performance on small files (Joe Landman) 3. Re: very bad performance on small files (Max Ivanov) 4. Re: very bad performance on small files (Joe Landman) 5. Re: very bad performance on small files (Marcus Bointon) 6. Re: very bad performance on small files (Joe Landman) 7. Re: very bad performance on small files (Max Ivanov) 8. Re: very bad performance on small files (Rudi Ahlers) ---------------------------------------------------------------------- Message: 1 Date: Fri, 14 Jan 2011 22:50:37 +0100 From: Marcus Bointon <marcus at synchromedia.co.uk> Subject: Re: [Gluster-users] very bad performance on small files To: Gluster General Discussion List <gluster-users at gluster.org> Message-ID: <C438BF2F-7B15-497B-BA0A-60E1311F43D4 at synchromedia.co.uk> Content-Type: text/plain; charset=us-ascii On 14 Jan 2011, at 18:58, Jacob Shucart wrote:> This kind of thing is fine on local disks, but when you're talking about a > distributed filesystem the network latency starts to add up since 1 > request to the web server results in a bunch of file requests.I think the main objection is that it takes a huge amount of network latency to explain a > 1,500% overhead with only 2 machines. On 14 Jan 2011, at 15:20, Joe Landman wrote:> MB size or largerSo does gluster become faster abruptly when file sizes cross some threshold? Or are average speeds are proportional to file size? Would be good to see a wider spread of values on benchmarks of throughput vs file size for the same overall volume (like Max's data but with more intermediate values) Marcus ------------------------------ Message: 2 Date: Fri, 14 Jan 2011 17:12:01 -0500 From: Joe Landman <landman at scalableinformatics.com> Subject: Re: [Gluster-users] very bad performance on small files To: gluster-users at gluster.org Message-ID: <4D30CA31.9060001 at scalableinformatics.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 01/14/2011 04:50 PM, Marcus Bointon wrote:> On 14 Jan 2011, at 18:58, Jacob Shucart wrote: > >> This kind of thing is fine on local disks, but when you're talking >> about a distributed filesystem the network latency starts to add up >> since 1 request to the web server results in a bunch of file >> requests. > > I think the main objection is that it takes a huge amount of network > latency to explain a> 1,500% overhead with only 2 machines.If most of your file access times are dominated by latency (e.g. small, seeky like loads), and you are going over a gigabit connection, yeah, your performance is going to crater on any cluster file system. Local latency to traverse the storage stack is on the order of 10's of microseconds. Physical latency of the disk medium is on the order of 10's of microseconds for RAMdisk, 100's of microseconds for flash/ssd, and 1000's of microseconds (e.g. milliseconds) for spinning rust. Now take 1 million small file writes. Say 1024 bytes. These million writes have to traverse the storage stack in the kernel to get to disk. Now add in a network latency event on the order of 1000's of microseconds for the remote storage stack and network stack to respond. I haven't measured it yet in a methodical manner, but I wouldn't be surprised to see IOP rates within a factor of 2 of the bare metal for a sufficiently fast network such as Infiniband, and within a factor of 4 or 5 for a slow network like Gigabit. Our own experience has been generally that you are IOP constrained because of the stack you have to traverse. If you add more latency into this stack, you have more to traverse, and therefore, you have more you need to wait. Which will have a magnification effect upon times for small IO ops which are seeky (stat, small writes, random ops).> > On 14 Jan 2011, at 15:20, Joe Landman wrote: > >> MB size or larger > > So does gluster become faster abruptly when file sizes cross some > threshold? Or are average speeds are proportional to file size? WouldIts a continuous curve, and very much user load specific. The fewer seeky operations you can do the better (true of all cluster file systems).> be good to see a wider spread of values on benchmarks of throughput > vs file size for the same overall volume (like Max's data but with > more intermediate values)I haven't seen Max's data, so I can't comment on this. Understand that performance is going to be bound by many things. One of many things is the speed of the spinning disk if thats what you use. Another will be network. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 ------------------------------ Message: 3 Date: Fri, 14 Jan 2011 22:19:58 +0000 From: Max Ivanov <ivanov.maxim at gmail.com> Subject: Re: [Gluster-users] very bad performance on small files To: gluster-users at gluster.org Message-ID: <AANLkTi=u6Ycfb_sTWSphGLv2J+9HJMNmf80Zgs84b0fy at mail.gmail.com> Content-Type: text/plain; charset=UTF-8> I haven't seen Max's data, so I can't comment on this. Understand that > performance is going to be bound by many things. One of many things is the > speed of the spinning disk if thats what you use. Another will be network. >It is very similair to kernel source tree - tons of small (2-20kb) files. 1.1G in total. ------------------------------ Message: 4 Date: Fri, 14 Jan 2011 17:20:58 -0500 From: Joe Landman <landman at scalableinformatics.com> Subject: Re: [Gluster-users] very bad performance on small files To: gluster-users at gluster.org Message-ID: <4D30CC4A.2010907 at scalableinformatics.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 01/14/2011 05:19 PM, Max Ivanov wrote:>> I haven't seen Max's data, so I can't comment on this. Understand that >> performance is going to be bound by many things. One of many things is the >> speed of the spinning disk if thats what you use. Another will be network. >> > > It is very similair to kernel source tree - tons of small (2-20kb) > files. 1.1G in total.Ok, worth looking into> _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users-- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 ------------------------------ Message: 5 Date: Sat, 15 Jan 2011 00:26:53 +0100 From: Marcus Bointon <marcus at synchromedia.co.uk> Subject: Re: [Gluster-users] very bad performance on small files To: Gluster General Discussion List <gluster-users at gluster.org> Message-ID: <2D8604F3-AED5-4C30-AE15-A798D4775765 at synchromedia.co.uk> Content-Type: text/plain; charset=us-ascii On 14 Jan 2011, at 23:12, Joe Landman wrote:> If most of your file access times are dominated by latency (e.g. small, seeky like loads), and you are going over a gigabit connection, yeah, your performance is going to crater on any cluster file system. > > Local latency to traverse the storage stack is on the order of 10's of microseconds. Physical latency of the disk medium is on the order of 10's of microseconds for RAMdisk, 100's of microseconds for flash/ssd, and 1000's of microseconds (e.g. milliseconds) for spinning rust. > > Now take 1 million small file writes. Say 1024 bytes. These million writes have to traverse the storage stack in the kernel to get to disk. > > Now add in a network latency event on the order of 1000's of microseconds for the remote storage stack and network stack to respond. > > I haven't measured it yet in a methodical manner, but I wouldn't be surprised to see IOP rates within a factor of 2 of the bare metal for a sufficiently fast network such as Infiniband, and within a factor of 4 or 5 for a slow network like Gigabit. > > Our own experience has been generally that you are IOP constrained because of the stack you have to traverse. If you add more latency into this stack, you have more to traverse, and therefore, you have more you need to wait. Which will have a magnification effect upon times for small IO ops which are seeky (stat, small writes, random ops).Sure, and all that applies equally to both NFS and gluster, yet in Max's example NFS was ~50x faster than gluster for an identical small-file workload. So what's gluster doing over and above what NFS is doing that's taking so long, given that network and disk factors are equal? I'd buy a factor of 2 for replication, but not 50. In case you missed what I'm on about, it was these stats that Max posted:> Here is the results per command: > dd if=/dev/zero of=M/tmp bs=1M count=16384 69.2 MB/se (Native) 69.2 > MB/sec(FUSE) 52 MB/sec (NFS) > dd if=/dev/zero of=M/tmp bs=1K count=163840000 88.1 MB/sec (Native) > 1.1MB/sec (FUSE) 52.4 MB/sec (NFS) > time tar cf - M | pv > /dev/null 15.8 MB/sec (native) 3.48MB/sec > (FUSE) 254 Kb/sec (NFS)In my case I'm running 30kiops SSDs over gigabit. At the moment my problem (running 3.0.6) isn't performance but reliability - files are occasionally reported as 'vanished' by front-end apps (like rsync) even though they are present on both backing stores; no errors in gluster logs, self-heal doesn't help. Marcus ------------------------------ Message: 6 Date: Fri, 14 Jan 2011 18:51:39 -0500 From: Joe Landman <landman at scalableinformatics.com> Subject: Re: [Gluster-users] very bad performance on small files To: gluster-users at gluster.org Message-ID: <4D30E18B.90500 at scalableinformatics.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 01/14/2011 06:26 PM, Marcus Bointon wrote:>> Our own experience has been generally that you are IOP constrained >> because of the stack you have to traverse. If you add more latency >> into this stack, you have more to traverse, and therefore, you have >> more you need to wait. Which will have a magnification effect upon >> times for small IO ops which are seeky (stat, small writes, random >> ops). > > Sure, and all that applies equally to both NFS and gluster, yet in > Max's example NFS was ~50x faster than gluster for an identical > small-file workload. So what's gluster doing over and above what NFS > is doing that's taking so long, given that network and disk factors > are equal? I'd buy a factor of 2 for replication, but not 50.If the NFS was doing attribute caching and the GlusterFS implementation had stat prefetch and other caching turned off, this could explain it.> In case you missed what I'm on about, it was these stats that Max > posted: > >> Here is the results per command: dd if=/dev/zero of=M/tmp bs=1M >> count=16384 69.2 MB/se (Native) 69.2 MB/sec(FUSE) 52 MB/sec (NFS) >> dd if=/dev/zero of=M/tmp bs=1K count=163840000 88.1 MB/sec >> (Native) 1.1MB/sec (FUSE) 52.4 MB/sec (NFS) time tar cf - M | pv> >> /dev/null 15.8 MB/sec (native) 3.48MB/sec (FUSE) 254 Kb/sec (NFS)Ok, I am not sure if I saw the numbers before. Thanks.> > In my case I'm running 30kiops SSDs over gigabit. At the moment my > problem (running 3.0.6) isn't performance but reliability - files are > occasionally reported as 'vanished' by front-end apps (like rsync) > even though they are present on both backing stores; no errors in > gluster logs, self-heal doesn't help.Check your stat-prefetch settings, and your time base. We've had some strange issues that seem to be correlated with time bases drifting. Including files disappearing. We have a few open tickets on this. The way we've worked around this problem is to abandon the NFS client and use the glusterfs client. Not our preferred option, but it provides a workaround for the moment. The NFS translator does appear to have a few issues. I am hoping we get more tuning knobs for it soon so we can see if we can work around this. Regards, Joe> > Marcus _______________________________________________ Gluster-users > mailing list Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users-- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 ------------------------------ Message: 7 Date: Sat, 15 Jan 2011 00:30:15 +0000 From: Max Ivanov <ivanov.maxim at gmail.com> Subject: Re: [Gluster-users] very bad performance on small files To: Marcus Bointon <marcus at synchromedia.co.uk> Cc: Gluster General Discussion List <gluster-users at gluster.org> Message-ID: <AANLkTik+_j1fMW5u+EC9DMQLu2FgYSUk4ZJwqbs6U1Wr at mail.gmail.com> Content-Type: text/plain; charset=UTF-8> Sure, and all that applies equally to both NFS and gluster, yet in Max's example NFS was ~50x faster than gluster for an identical small-file workload. So what's gluster doing over and above what NFS is doing that's taking so long, given that network and disk factors are equal? I'd buy a factor of 2 for replication, but not 50. >Sorry If I didnt make it clear but both NFS in my tests is not well known classic NFS but glusterfs in NFS mode. ------------------------------ Message: 8 Date: Sat, 15 Jan 2011 11:18:22 +0200 From: Rudi Ahlers <Rudi at SoftDux.com> Subject: Re: [Gluster-users] very bad performance on small files To: Jacob Shucart <jacob at gluster.com> Cc: gluster-users at gluster.org Message-ID: <sig.3996530d0f.AANLkTinY=zUbjGhto470YGTwhd_vzBb6fpj4-WE+m+B- at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 On Fri, Jan 14, 2011 at 7:58 PM, Jacob Shucart <jacob at gluster.com> wrote:> For web hosting it is best to put user generated content(images, etc) on > Gluster but to leave application files like PHP files on the local disk. > This is because a single application file request could result in 20 other > file requests since applications like PHP use includes/inherits, etc. > This kind of thing is fine on local disks, but when you're talking about a > distributed filesystem the network latency starts to add up since 1 > request to the web server results in a bunch of file requests. > > -----Original Message----- > From: gluster-users-bounces at gluster.org > [mailto:gluster-users-bounces at gluster.org] On Behalf Of Max Ivanov > Sent: Friday, January 14, 2011 6:09 AM > To: Burnash, James > Cc: gluster-users at gluster.org > Subject: Re: [Gluster-users] very bad performance on small files > >> Gluster - and in fact most (all?) parallel filesystems are optimized for > very large files. That being the case, small files are not retrieved as > efficiently, and result in a larger number of file operations in total > because there are a fixed number for each file accessed. > > > Which makes glusterfs perfomance unacceptable for web hosting purposes =( > _______________________________________________So what can one use for webhosting purposes? We use XEN / KVM virtual machines, hosted on NAS devices but the NAS devices doesn't have an easy upgrade path. We literally have to rsync all the data to the new device and then shutdown all the machines on the old one and restart them on the new one. They don't provide 100% uptime either. So I'm looking for something with easier upgrade (GlusterFS can do this) and better uptime (again, GlusterFS can do this). But it's clear that GlusterFS isn't made for small files, so what else could work well for us? -- Kind Regards Rudi Ahlers SoftDux Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 ------------------------------ _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users End of Gluster-users Digest, Vol 33, Issue 23 ********************************************* The information contained in this email message and its attachments is intended only for the private and confidential use of the recipient(s) named above, unless the sender expressly agrees otherwise. Transmission of email over the Internet is not a secure communications medium. If you are requesting or have requested the transmittal of personal data, as defined in applicable privacy laws by means of email or in an attachment to email, you must select a more secure alternate means of transmittal that supports your obligations to protect such personal data. If the reader of this message is not the intended recipient and/or you have received this email in error, you must take no action based on the information in this email and you are hereby notified that any dissemination, misuse or copying or disclosure of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by email and delete the original message.