thr3ads.net - Gluster users - [Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Ken Randall

2011-Jul-18 00:56 UTC

[Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS

I'll try to keep it brief, I've been testing GlusterFS for the last
month or
so. My production setup will be more complex than what I'm listing below,
but I've whittled things down to where the below setup will cause the
problem to happen.

I'm running GlusterFS 3.2.2 on two CentOS 5.6 boxes in a replicated volume.
I am connecting to it with a Windows Server 2008 R2 box over an SMB share.
Basically, the web app portion runs locally on the Windows box, but content
(e.g. HTML templates, images, CSS files, JS, etc.) is being pulled from the
Gluster volume.

I've performed a fair degree of load testing on the setup so far, scaling up
the load to nearly four times what our normal production environment sees in
primetime, and it seems to handle it fine. We run tens of thousands of
websites, so this is pretty significant that it's able to handle that.

However, as a part of a different suite of tests is a Page of Death, which
contains tens of thousands of image references on a single page. All I have
to do is load that page for a few seconds, and it will grind my web server's
SMB connection to a near complete standstill. I can close the browser after
just a few seconds, and it still takes several minutes for the web server to
respond to any requests at all. Connecting to the share over Explorer is
extremely slow from that same machine. (I can connect to that same share
from another machine, which is an export of the same exact GlusterFS mount,
and it is just fine. Similarly, accessing the Gluster mount on the Linux
boxes shows zero problems at all, it's as happy to respond to requests as
ever.)

Even if I scale it out to a swath of web servers, loading that single page,
one time, for just a few seconds will freeze every single web server, making
every website on the system inaccessible.

You may be asking, why am I asking here instead of on a Samba group, or even
a Windows group? Here's why: My control is that I have a Windows file
server that I can swap in Gluster's place, and I'm able to load that
page
without it blinking an eye (it actually becomes a test of the computer that
the browser is on). It does not affect any of the web servers' in the
slightest. My second control is that I have exported the raw Gluster data
directory as an SMB share (with the same exact Samba configuration as the
Gluster one), and it performs equally as well as the Windows file server. I
can load the Page of Death with no consequence.

I've pushed IO-threads all the way to the maximum 64 without any benefit. I
can't see anything noteworthy in the Gluster or Samba logs, but perhaps I am
not sure what to look for.

Thank you to anybody who can point me the right direction. I am hoping I
don't have to dive into Wireshark or tcpdump territory, but I'm open if
you
can guide the way! ;)

Ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://supercolony.gluster.org/pipermail/gluster-users/attachments/20110717/c005e83d/attachment.html>

Joe Landman

2011-Jul-18 01:02 UTC

head link

[Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS

On 07/17/2011 08:56 PM, Ken Randall wrote:
> You may be asking, why am I asking here instead of on a Samba group, or
> even a Windows group?  Here's why:  My control is that I have a Windows
> file server that I can swap in Gluster's place, and I'm able to
load
> that page without it blinking an eye (it actually becomes a test of the
> computer that the browser is on).  It does not affect any of the web
> servers' in the slightest.  My second control is that I have exported
> the raw Gluster data directory as an SMB share (with the same exact
> Samba configuration as the Gluster one), and it performs equally as well
> as the Windows file server.  I can load the Page of Death with no
> consequence.
NTFS with SMB sharing caches everything.  First page load may take a bit 
of time, but subsequent will be running from data stored in RAM.

You can adjust SMB caching and Gluster caching as needed.
> I've pushed IO-threads all the way to the maximum 64 without any
> benefit.  I can't see anything noteworthy in the Gluster or Samba logs,
> but perhaps I am not sure what to look for.
Not likely your issue.  More probably its a Gluster cache size coupled 
with some CIFS tuning you need.
>
> Thank you to anybody who can point me the right direction.  I am hoping
> I don't have to dive into Wireshark or tcpdump territory, but I'm
open
> if you can guide the way!  ;)
You might need to strace -P the slow servers.  Would help to know what 
calls they are stuck on.


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

Whit Blauvelt

2011-Jul-18 01:06 UTC

head link

[Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS

On Sun, Jul 17, 2011 at 07:56:57PM -0500, Ken Randall wrote:
> However, as a part of a different suite of tests is a Page of Death, which
> contains tens of thousands of image references on a single page. 
Off topic response: Is there ever in real production any page, anywhere,
tht contains tens of thousands of image references? I'm all for testing at
the extreme, and capacity that goes far beyond what's needed for practical
purposes. Is that what this is, or do you anticipate real-life Page o' Death
scenarios?

Closer to the topic: What's going on with the load on the various systems.
On the Linux side, have you watched each of them with something like htop?

Whit

Gluster users - Jul 2011 - not sure how to troubleshoot SMB / CIFS overload when using GlusterFS

[Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS

[Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS

[Gluster-users] not sure how to troubleshoot SMB / CIFS overload when using GlusterFS