thr3ads.net - Ocfs2 users - [Ocfs2-users] ocfs2 hangs during webserver usage [Jan 2009]

If this information is useful, please help other people find it:
Share via:

David Johle

2009-Jan-23 00:35 UTC

[Ocfs2-users] ocfs2 hangs during webserver usage

I have been running a 2 node cluster of web servers handling around 
800,000 - 1,200,000 requests on a typical weekday.  The systems are 
each dual quad core xeon w/4GB ram, and generally maintain a fairly 
low load average.  The servers are sharing a device via iSCSI SAN for 
the OCFS2 volume.  They are running RHEL5 U2 (2.6.18-92.1.10.el5) and 
OCFS2 1.2.9-1 binaries of the matching kernel version.

The cluster has been in a production environment for about 1.5 months 
now, and just in the past last week it has started to have 
problems.  The user experience is an occasional lag of 5 to 15 
seconds, after which everything appears normal.  Digging deeper into 
the problem I had narrowed it to an I/O issue, and iostat shows near 
100% utilization on said device during the lag.  Once it clears the 
utilization is back down to a consistent 0-5% average.  Also, when 
the lag is happening, a process listing shows the affected processes 
in the D state.

After some reading of the mailing list archives and much google 
searching, I found this old thread which seems quite similar:
http://oss.oracle.com/pipermail/ocfs2-users/2007-May/001643.html

And the corresponding bug from this same author:
http://oss.oracle.com/bugzilla/show_bug.cgi?id=882

I see it has been a long time since this bug was last updated, but 
there seems to have been no [documented] resolution.

I wrote a small script which hits a very simple web page at 10 second 
intervals.  If the server responds in less than 1 second it outputs a 
dot; if the page takes more than 1 second to load it outputs a line 
indicating how long it took, which node processed it, and a 
timestamp.  Here is some sample output over the course of about an hour:

................
Real time of 1 for wolf at Thu Jan 22 16:35:32 CST 2009
............................................................
Real time of 6 for cow at Thu Jan 22 16:45:52 CST 2009
.....
Real time of 7 for cow at Thu Jan 22 16:46:59 CST 2009
.................................................................................
Real time of 12 for wolf at Thu Jan 22 17:00:54 CST 2009
...................................
Real time of 3 for wolf at Thu Jan 22 17:06:58 CST 2009
....................
Real time of 13 for cow at Thu Jan 22 17:10:42 CST 2009
...
Real time of 11 for cow at Thu Jan 22 17:11:33 CST 2009
.......................
Real time of 9 for cow at Thu Jan 22 17:15:43 CST 2009
.......................................
Real time of 2 for wolf at Thu Jan 22 17:22:27 CST 2009
..
Real time of 8 for wolf at Thu Jan 22 17:23:05 CST 2009
.............................
Real time of 10 for wolf at Thu Jan 22 17:28:17 CST 2009
....................................
Real time of 14 for wolf at Thu Jan 22 17:34:42 CST 2009


There are no real patterns here, but the output is interesting 
nonetheless -- it does show that it is intermittent, but quite frequent.

I tried using the "ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN" 
command to see what processes were hanging on each system during the 
time of the lags.  Like Stephan's output, I see a few processes in 
the D (uninterruptable sleep) state, which makes sense if they are 
performing I/O.  And also like Stephan, one node shows the "-" 
output, and the other shows "ocfs2_wait_for_status_completion" as the
function.

node cow:
   PID STAT COMMAND         WIDE-WCHAN-COLUMN
  8511 D    cronolog        ocfs2_wait_for_status_completion
  8510 D    cronolog        ocfs2_wait_for_status_completion

node wolf:
   PID STAT COMMAND         WIDE-WCHAN-COLUMN
  9439 D    cronolog        -

In my case, however, it was not the httpd process that was listed, 
but rather cronolog (which is what I use with apache for automatic 
daily log rotations).  I assume that in his case it the http was 
directly writing to log files at the time of the hangs.  Also, mine 
typically was only a single process on each system in the D state, 
only sometimes more.

Another thing I tried was to shut down httpd on one of the nodes to 
see if the lags went away -- the result was unchanged, still had 
similar lags at similar intervals.  This time, the "ps" output on the 
non-participating node was clean, but the other had multiple 
processes, and not all web server related:

node wolf:
   PID STAT COMMAND         WIDE-WCHAN-COLUMN
  9436 D    cronolog        -
  9438 D    cronolog        -
  9439 D    cronolog        -
  9448 D    cronolog        -
  9457 D    cronolog        -
31201 D    httpd           -
31202 D    httpd           -
31710 D    vsftpd          sync_buffer

So apparently this problem can happen between processes of the same 
node, not just between nodes.

Watching a "top" command on each box during the lags only shows CPU 
usage on iowait, no significant user/system CPU.

Sunil Mushran

2009-Jan-23 22:28 UTC

head link

[Ocfs2-users] ocfs2 hangs during webserver usage

The two issues are different. For starters, the issue in bugzilla#882
was a hang. Not a temporary slowdown. And the stack showed it was related
to flock() that ocfs2 1.2 did not support. Well, not cluster-aware flock().
Support for clustered flock() was added in Kernel 2.6.25-ish / ocfs2 1.4.

Are the apache logs also hitting the iscsi storage? If so, one explanation
could be that the log flush is saturating the network. That would cause
the ioutil to jump higher affecting the httpd ios.

Sunil

David Johle wrote:> I have been running a 2 node cluster of web servers handling around 
> 800,000 - 1,200,000 requests on a typical weekday.  The systems are 
> each dual quad core xeon w/4GB ram, and generally maintain a fairly 
> low load average.  The servers are sharing a device via iSCSI SAN for 
> the OCFS2 volume.  They are running RHEL5 U2 (2.6.18-92.1.10.el5) and 
> OCFS2 1.2.9-1 binaries of the matching kernel version.
>
> The cluster has been in a production environment for about 1.5 months 
> now, and just in the past last week it has started to have 
> problems.  The user experience is an occasional lag of 5 to 15 
> seconds, after which everything appears normal.  Digging deeper into 
> the problem I had narrowed it to an I/O issue, and iostat shows near 
> 100% utilization on said device during the lag.  Once it clears the 
> utilization is back down to a consistent 0-5% average.  Also, when 
> the lag is happening, a process listing shows the affected processes 
> in the D state.
>
> After some reading of the mailing list archives and much google 
> searching, I found this old thread which seems quite similar:
> http://oss.oracle.com/pipermail/ocfs2-users/2007-May/001643.html
>
> And the corresponding bug from this same author:
> http://oss.oracle.com/bugzilla/show_bug.cgi?id=882
>
> I see it has been a long time since this bug was last updated, but 
> there seems to have been no [documented] resolution.
>
> I wrote a small script which hits a very simple web page at 10 second 
> intervals.  If the server responds in less than 1 second it outputs a 
> dot; if the page takes more than 1 second to load it outputs a line 
> indicating how long it took, which node processed it, and a 
> timestamp.  Here is some sample output over the course of about an hour:
>
> ................
> Real time of 1 for wolf at Thu Jan 22 16:35:32 CST 2009
> ............................................................
> Real time of 6 for cow at Thu Jan 22 16:45:52 CST 2009
> .....
> Real time of 7 for cow at Thu Jan 22 16:46:59 CST 2009
>
.................................................................................
> Real time of 12 for wolf at Thu Jan 22 17:00:54 CST 2009
> ...................................
> Real time of 3 for wolf at Thu Jan 22 17:06:58 CST 2009
> ....................
> Real time of 13 for cow at Thu Jan 22 17:10:42 CST 2009
> ...
> Real time of 11 for cow at Thu Jan 22 17:11:33 CST 2009
> .......................
> Real time of 9 for cow at Thu Jan 22 17:15:43 CST 2009
> .......................................
> Real time of 2 for wolf at Thu Jan 22 17:22:27 CST 2009
> ..
> Real time of 8 for wolf at Thu Jan 22 17:23:05 CST 2009
> .............................
> Real time of 10 for wolf at Thu Jan 22 17:28:17 CST 2009
> ....................................
> Real time of 14 for wolf at Thu Jan 22 17:34:42 CST 2009
>
>
> There are no real patterns here, but the output is interesting 
> nonetheless -- it does show that it is intermittent, but quite frequent.
>
> I tried using the "ps -e -o
pid,stat,comm,wchan=WIDE-WCHAN-COLUMN"
> command to see what processes were hanging on each system during the 
> time of the lags.  Like Stephan's output, I see a few processes in 
> the D (uninterruptable sleep) state, which makes sense if they are 
> performing I/O.  And also like Stephan, one node shows the "-" 
> output, and the other shows "ocfs2_wait_for_status_completion" as
the function.
>
> node cow:
>    PID STAT COMMAND         WIDE-WCHAN-COLUMN
>   8511 D    cronolog        ocfs2_wait_for_status_completion
>   8510 D    cronolog        ocfs2_wait_for_status_completion
>
> node wolf:
>    PID STAT COMMAND         WIDE-WCHAN-COLUMN
>   9439 D    cronolog        -
>
> In my case, however, it was not the httpd process that was listed, 
> but rather cronolog (which is what I use with apache for automatic 
> daily log rotations).  I assume that in his case it the http was 
> directly writing to log files at the time of the hangs.  Also, mine 
> typically was only a single process on each system in the D state, 
> only sometimes more.
>
> Another thing I tried was to shut down httpd on one of the nodes to 
> see if the lags went away -- the result was unchanged, still had 
> similar lags at similar intervals.  This time, the "ps" output on
the
> non-participating node was clean, but the other had multiple 
> processes, and not all web server related:
>
> node wolf:
>    PID STAT COMMAND         WIDE-WCHAN-COLUMN
>   9436 D    cronolog        -
>   9438 D    cronolog        -
>   9439 D    cronolog        -
>   9448 D    cronolog        -
>   9457 D    cronolog        -
> 31201 D    httpd           -
> 31202 D    httpd           -
> 31710 D    vsftpd          sync_buffer
>
> So apparently this problem can happen between processes of the same 
> node, not just between nodes.
>
> Watching a "top" command on each box during the lags only shows
CPU
> usage on iowait, no significant user/system CPU.
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>

Michael Moody

2009-Jan-27 01:14 UTC

head link

[Ocfs2-users] ocfs2 hangs during webserver usage

I want to throw in my experience here:

You WILL have issues if you have reasonably high load, with multiple nodes
performing writes to log files. It will hang all processes waiting on the write
lock. In the case of apache, especially when using prefork, this is a VERY BAD
idea. It is highly highly recommended to move your log files outside of the
cluster FS. We run ocfs2 in production, hundreds of gigs of files, with 5 nodes
per cluster, 4 clusters. Sustained throughput of 200mbit/s being served by these
nodes. All this is on a fiber channel SAN with very fast arrays. Even so, when
we stumble across the occasional log file contention, and remove it (say, a php
file writing to a log file, or apache, etc), load drops considerably. When we
had apache writing log files to the cluster FS with ? the current load, apache
choked and died, it couldn't handle the contention.

It will continue to escalate in a non-linear fashion. My recommendations:


1.       Move all logs, and any files which are written to concurrently off to
local system (yes, it makes administration harder, but systems happier, and was
a MUST for our environment, and we've managed to push OCFS2 scaling a lot
farther by doing so)

2.       Switch away from prefork if you are using that

3.       Make absolutely sure you're mounted with noatime

4.       If possible, switch away from apache and utilize lighttpd (we recently
did this, and were able to gain another 20-30% out of each box)

Relevant lighttpd settings:

server.max-fds = 2048
server.max-write-idle = 120
server.max-worker = 8
server.network-backend = "linux-sendfile"
server.stat-cache-engine = "simple"
server.max-keep-alive-requests = 10
server.max-keep-alive-idle = 8
server.event-handler = "linux-sysepoll"

Again, we've been using OCFS2 in production for 2 years, throughout many
problems, and I highly recommend moving the log files to local storage, OR
piping them to a network cable syslog. (Also a possible solution).

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090126/33c034db/attachment.html

Michael Moody

2009-Jan-27 20:19 UTC

head link

[Ocfs2-users] ocfs2 hangs during webserver usage

Bret,

Also, intel adapters occasionally had a problem with requires a firmware fix
(performed via ethtool) using fixeep-82573-dspd.sh

(However, I still doubt that is the problem in this case, it's almost always
worth checking to see if you have an intel NIC affected by that bug)

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090127/5244359f/attachment.html

David Johle

2009-Jan-27 23:43 UTC

head link

[Ocfs2-users] ocfs2 hangs during webserver usage

Well I can confirm that in my case the intel NICs are not affected by 
this bug.  Checked the EEPROM dump, as well as used the script to 
verify, and neither one indicates a problem.

On these particular servers, the iSCSI target is being accessed over 
2 NICs, one is a Broadcomm onboard, and one is an Intel PCIe 
card.  I'm using device-mapper for managing the active/active 
multipathing.  All in all I've had great performance with this setup, 
actually seeing very little overhead despite using a software initiator.

At 05:01 PM 1/27/2009, ocfs2-users-request at oss.oracle.com
wrote:>Bret,
>
>Also, intel adapters occasionally had a problem with requires a 
>firmware fix (performed via ethtool) using fixeep-82573-dspd.sh
>
>(However, I still doubt that is the problem in this case, it's 
>almost always worth checking to see if you have an intel NIC 
>affected by that bug)
>
>Michael

Luis Freitas

2009-Jan-28 03:20 UTC

head link

[Ocfs2-users] ocfs2 hangs during webserver usage

Sunil,

? Hmm, I thought you meant the cluster size.

Regards,
Luis 

--- On Wed, 1/28/09, Sunil Mushran <sunil.mushran at oracle.com> wrote:
From: Sunil Mushran <sunil.mushran at oracle.com>
Subject: Re: [Ocfs2-users] ocfs2 hangs during webserver usage
To: lfreitas34 at yahoo.com
Cc: ocfs2-users at oss.oracle.com
Date: Wednesday, January 28, 2009, 1:08 AM

Luis Freitas wrote:>    The extent size would play a role on this also, as Sunil pointed. Youcould check if the extent size is different on your test environment. The mkfs
tool might have defaulted a larger extent size if the total size of the
filesystem is larger. (Sunil, correct me on this if I am wrong).

The extend size I was referring to is not related to any ondisk size.
It depends on the location the write() is issued to. Say you have a
100K sized file and you issue a write() at offset 1G. If the fs supports
sparse files, it will allocate (and init) blocks only around the 1G location,
leaving a hole in between. A non-sparsefile supporting fs, on the other hand,
will need to allocate (and init) the entire space between 100K and 1G.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090127/885f8622/attachment.html

Michael Moody

2009-Jan-28 09:22 UTC

head link

[Ocfs2-users] ocfs2 hangs during webserver usage

It's also worth pointing out that it wouldn't be best to write even to
separate log files on the ocfs2 volume. For one, using iSCSI, you're
probably using only gigabit Ethernet. (You could be using 10gigE, but you
didn't mention it). As performance is a concern, I would move what I could
off of the ocfs2 volume (including, where possible, shared codebase) to local
storage, and utilize symlinks where needed. For logging, again, I highly
recommend using local storage only. As I said, we've been using this in
production for many webservers for a long time (over 2 years), and we've
learned a lot of lessons over those years.


1.       Log to local disk only, we've had MANY problems with write
contention and log files (this is my number on recommendation, and moving those
log files will probably eliminate your problem)

2.       Move common code/files to local storage where possible (for instance,
we have vbulletin, with the attachments, etc on the ocfs2 volume, but the php
files which are common are on local storage)

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090128/bb56b64d/attachment.html

David Johle

2009-Jan-28 21:40 UTC

head link

[Ocfs2-users] ocfs2 hangs during webserver usage

>I will try changing the logging to use separate files for a day or 
>so to at least see what effect it has with regards to this issue...stay
tuned.

Okay, that was easy to test, and fail -- still plenty of lag 
situations.  As expected, the results agree with another test I did 
which was to shut down one of the two nodes to the point of 
unmounting the shared filesystem.

I believe the problem here is that actual chid processes of Apache 
locally sharing the logfile is just as bad as sharing across 
nodes.  Maybe perhaps that is the root cause of the problem, and the 
fact that additional nodes are affected is just consequential.

Michael Moody suggested not using prefork MPM, presumably for this 
reason.  I wonder, does the worker MPM "consolidate" all the writes 
from the threads within a worker into a single buffered write 
stream?  If so, I can see how having 32 threads per worker would not 
reduce the amount of writes to the logfile, but could reduce the 
amount of contending I/O requests by a factor of 32.  Is this the case?

Way back when we first transitioned to Apache 2.x we looked into 
using a better MPM, but had some necessary modules installed that 
were not thread-safe, and thus would only work with prefork.  I am 
pretty sure this is still a constraint for us at this point.

Michael Moody

2009-Jan-28 22:14 UTC

head link

[Ocfs2-users] ocfs2 hangs during webserver usage

David,

Again, I recommend local storage for the logs. I know I've said it many
times, but I've run into these SAME issues, and local was the way to go. For
log parsing across nodes, or for tailing them all at the same time, we utilize
Capistrano (a ruby-based project), which easily allows me to tail log files
across multiple servers (or perform any operations for that matter) all to one
window.

Really though, try local storage for your logs.

In regards to the MPM, prefork has benefits, and also severe disadvantages.
I'm not certain that the worker MPM buffers the logs, and of course,
you'll want multiple worker processes (and threads), but it certainly is an
improvement over prefork. However, if you have non-threadsafe modules (just
curious, which ones?), as you stated, you may have trouble. This is one of the
reasons that where we can, we switched to lighttpd.

In regards to non-threadsafe modules, if you are specifically speaking of
handlers (php, ruby, perl, etc), you could simply migrate those to use fastcgi
(I recommend mod_fcgid). It's a considerable improvement, and works great
with non-threadsafe modules/libraries.

Your best solution however, is still to use local log files. Again, it's
easy to test, and if the lags go away, you'll know.

Michael

David Johle djohle at industrialinfo.com
<mailto:ocfs2-users%40oss.oracle.com?Subject=Re:%20%5BOcfs2-users%5D%20ocfs2%20hangs%20during%20webserver%20usage&In-Reply-To=%3C200901282147.n0SLlGB9025423%40mix.industrialinfo.com%3E>
Wed Jan 28 13:40:53 PST 2009

Okay, that was easy to test, and fail -- still plenty of lag
situations. As expected, the results agree with another test I did
which was to shut down one of the two nodes to the point of
unmounting the shared filesystem.

I believe the problem here is that actual chid processes of Apache
locally sharing the logfile is just as bad as sharing across
nodes. Maybe perhaps that is the root cause of the problem, and the
fact that additional nodes are affected is just consequential.

Michael Moody suggested not using prefork MPM, presumably for this
reason. I wonder, does the worker MPM "consolidate" all the writes
from the threads within a worker into a single buffered write
stream? If so, I can see how having 32 threads per worker would not
reduce the amount of writes to the logfile, but could reduce the
amount of contending I/O requests by a factor of 32. Is this the case?

Way back when we first transitioned to Apache 2.x we looked into
using a better MPM, but had some necessary modules installed that
were not thread-safe, and thus would only work with prefork. I am
pretty sure this is still a constraint for us at this point.

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090128/716663c3/attachment.html

Ocfs2 users - Jan 2009 - ocfs2 hangs during webserver usage

[Ocfs2-users] ocfs2 hangs during webserver usage

[Ocfs2-users] ocfs2 hangs during webserver usage

[Ocfs2-users] ocfs2 hangs during webserver usage

[Ocfs2-users] ocfs2 hangs during webserver usage

[Ocfs2-users] ocfs2 hangs during webserver usage

[Ocfs2-users] ocfs2 hangs during webserver usage

[Ocfs2-users] ocfs2 hangs during webserver usage

[Ocfs2-users] ocfs2 hangs during webserver usage

[Ocfs2-users] ocfs2 hangs during webserver usage