Ken Randall
2011-Jul-18 00:56 UTC
[Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS
I'll try to keep it brief, I've been testing GlusterFS for the last month or so. My production setup will be more complex than what I'm listing below, but I've whittled things down to where the below setup will cause the problem to happen. I'm running GlusterFS 3.2.2 on two CentOS 5.6 boxes in a replicated volume. I am connecting to it with a Windows Server 2008 R2 box over an SMB share. Basically, the web app portion runs locally on the Windows box, but content (e.g. HTML templates, images, CSS files, JS, etc.) is being pulled from the Gluster volume. I've performed a fair degree of load testing on the setup so far, scaling up the load to nearly four times what our normal production environment sees in primetime, and it seems to handle it fine. We run tens of thousands of websites, so this is pretty significant that it's able to handle that. However, as a part of a different suite of tests is a Page of Death, which contains tens of thousands of image references on a single page. All I have to do is load that page for a few seconds, and it will grind my web server's SMB connection to a near complete standstill. I can close the browser after just a few seconds, and it still takes several minutes for the web server to respond to any requests at all. Connecting to the share over Explorer is extremely slow from that same machine. (I can connect to that same share from another machine, which is an export of the same exact GlusterFS mount, and it is just fine. Similarly, accessing the Gluster mount on the Linux boxes shows zero problems at all, it's as happy to respond to requests as ever.) Even if I scale it out to a swath of web servers, loading that single page, one time, for just a few seconds will freeze every single web server, making every website on the system inaccessible. You may be asking, why am I asking here instead of on a Samba group, or even a Windows group? Here's why: My control is that I have a Windows file server that I can swap in Gluster's place, and I'm able to load that page without it blinking an eye (it actually becomes a test of the computer that the browser is on). It does not affect any of the web servers' in the slightest. My second control is that I have exported the raw Gluster data directory as an SMB share (with the same exact Samba configuration as the Gluster one), and it performs equally as well as the Windows file server. I can load the Page of Death with no consequence. I've pushed IO-threads all the way to the maximum 64 without any benefit. I can't see anything noteworthy in the Gluster or Samba logs, but perhaps I am not sure what to look for. Thank you to anybody who can point me the right direction. I am hoping I don't have to dive into Wireshark or tcpdump territory, but I'm open if you can guide the way! ;) Ken -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110717/c005e83d/attachment.html>
Joe Landman
2011-Jul-18 01:02 UTC
[Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS
On 07/17/2011 08:56 PM, Ken Randall wrote:> You may be asking, why am I asking here instead of on a Samba group, or > even a Windows group? Here's why: My control is that I have a Windows > file server that I can swap in Gluster's place, and I'm able to load > that page without it blinking an eye (it actually becomes a test of the > computer that the browser is on). It does not affect any of the web > servers' in the slightest. My second control is that I have exported > the raw Gluster data directory as an SMB share (with the same exact > Samba configuration as the Gluster one), and it performs equally as well > as the Windows file server. I can load the Page of Death with no > consequence.NTFS with SMB sharing caches everything. First page load may take a bit of time, but subsequent will be running from data stored in RAM. You can adjust SMB caching and Gluster caching as needed.> I've pushed IO-threads all the way to the maximum 64 without any > benefit. I can't see anything noteworthy in the Gluster or Samba logs, > but perhaps I am not sure what to look for.Not likely your issue. More probably its a Gluster cache size coupled with some CIFS tuning you need.> > Thank you to anybody who can point me the right direction. I am hoping > I don't have to dive into Wireshark or tcpdump territory, but I'm open > if you can guide the way! ;)You might need to strace -P the slow servers. Would help to know what calls they are stuck on. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/sicluster phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615
Whit Blauvelt
2011-Jul-18 01:06 UTC
[Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS
On Sun, Jul 17, 2011 at 07:56:57PM -0500, Ken Randall wrote:> However, as a part of a different suite of tests is a Page of Death, which > contains tens of thousands of image references on a single page.Off topic response: Is there ever in real production any page, anywhere, tht contains tens of thousands of image references? I'm all for testing at the extreme, and capacity that goes far beyond what's needed for practical purposes. Is that what this is, or do you anticipate real-life Page o' Death scenarios? Closer to the topic: What's going on with the load on the various systems. On the Linux side, have you watched each of them with something like htop? Whit