thr3ads.net - Ocfs2 users - [Ocfs2-users] Slow on open() [Jan 2010]

If this information is useful, please help other people find it:
Share via:

Somsak Sriprayoonsakul

2010-Jan-19 09:12 UTC

[Ocfs2-users] Slow on open()

Hello,

We are using OCFS2 version 1.4.3 on CentOS5, x86_64 with 8GB memory. The
underlying storage is HP 2312fc smart array equipped with 12 SAS 15K rpm,
configured as RAID10 using 10 HDDs + 2 spares. The array has about 4GB
cache. Communication is 4Gbps FC, through HP StorageWorks 8/8 Base e-port
SAN Switch. Right now we only have this machine connect to the SAN through
switch, but we plan to add more machine to utilize this SAN system.

Our application is apache version 1.3.41, mostly serving static HTML file +
few PHP. Note that, we have to downgrade to 1.3.41 due to our application
requirement. Apache is configured on has 500 MaxClients.

The storage OCFS2 are formatted with mkfs.ocfs2 without any special option
on. It run directly from multipath'ed SAN storage without LVM or software
RAID. We mount OCFS2 with noatime, commit=15, and data=writeback (as well as
heartbeat=local). Our cluster.conf is like this

cluster:
    node_count = 1
    name = mycluster

node:
    ip_port = 7777
    ip_address = 203.123.123.123
    number = 1
    name = mycluster.mydomain.com
    cluster = mycluster

(NOTE: Some details are neglected here, such as hostname and IP address).

Periodically, we found that the file system work very slow. I think that it
happened once every few minutes. When the file system slow, httpd process
CPU utilization will goes much higher to about 50% or above. I tried to
debug this slow by creating a small script that periodically do

strace -f dd if=/dev/zero of=/san/testfile bs=1k count=1

And time the speed of dd, usually dd will finish within subsecond, but
periodically dd will be much slower to about 30-60 seconds. Strace output
show this.

     0.000026 open("/san/testfile", O_WRONLY|O_CREAT|O_TRUNC, 0666) =
1
    76.418696 rt_sigaction(SIGUSR1, NULL, {SIG_DFL, [], 0}, 8) = 0

So I presume that this mean the open system call is periodically very slow.
I did about 5-10 tests which yield similar strace'd results (ranging from
just 5-7 seconds to 80 seconds).

So my question is, what could be the cause of this slowness? How could I
debug this deeper? On which point should we optimize the file system?

We are in the process of purchasing and adding more web servers to the
system and use reverse proxy to load balance between two servers. We just
want to make sure that this will not make situation worst.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20100119/b503f0cf/attachment.html

Sunil Mushran

2010-Jan-19 21:24 UTC

head link

[Ocfs2-users] Slow on open()

Is that using the cciss driver? I have heared of similar sporadic 
performance
issues with the cciss driver. I doubt this is an ocfs2 issue. I would 
recommend
you ping some support people who can look at your io setup more closely.

Somsak Sriprayoon sakul wrote:> Hello,
>
> We are using OCFS2 version 1.4.3 on CentOS5, x86_64 with 8GB memory. 
> The underlying storage is HP 2312fc smart array equipped with 12 SAS 
> 15K rpm, configured as RAID10 using 10 HDDs + 2 spares. The array has 
> about 4GB cache. Communication is 4Gbps FC, through HP StorageWorks 
> 8/8 Base e-port SAN Switch. Right now we only have this machine 
> connect to the SAN through switch, but we plan to add more machine to 
> utilize this SAN system.
>
> Our application is apache version 1.3.41, mostly serving static HTML 
> file + few PHP. Note that, we have to downgrade to 1.3.41 due to our 
> application requirement. Apache is configured on has 500 MaxClients.
>
> The storage OCFS2 are formatted with mkfs.ocfs2 without any special 
> option on. It run directly from multipath'ed SAN storage without LVM 
> or software RAID. We mount OCFS2 with noatime, commit=15, and 
> data=writeback (as well as heartbeat=local). Our cluster.conf is like this
>
> cluster:
>     node_count = 1
>     name = mycluster
>
> node:
>     ip_port = 7777
>     ip_address = 203.123.123.123
>     number = 1
>     name = mycluster.mydomain.com <http://mycluster.mydomain.com>
>     cluster = mycluster
>
> (NOTE: Some details are neglected here, such as hostname and IP address).
>
> Periodically, we found that the file system work very slow. I think 
> that it happened once every few minutes. When the file system slow, 
> httpd process CPU utilization will goes much higher to about 50% or 
> above. I tried to debug this slow by creating a small script that 
> periodically do
>
> strace -f dd if=/dev/zero of=/san/testfile bs=1k count=1
>
> And time the speed of dd, usually dd will finish within subsecond, but 
> periodically dd will be much slower to about 30-60 seconds. Strace 
> output show this.
>
>      0.000026 open("/san/testfile", O_WRONLY|O_CREAT|O_TRUNC,
0666) = 1
>     76.418696 rt_sigaction(SIGUSR1, NULL, {SIG_DFL, [], 0}, 8) = 0
>
> So I presume that this mean the open system call is periodically very 
> slow. I did about 5-10 tests which yield similar strace'd results 
> (ranging from just 5-7 seconds to 80 seconds).
>
> So my question is, what could be the cause of this slowness? How could 
> I debug this deeper? On which point should we optimize the file system?
>
> We are in the process of purchasing and adding more web servers to the 
> system and use reverse proxy to load balance between two servers. We 
> just want to make sure that this will not make situation worst.

Brad Plant

2010-Jan-28 09:31 UTC

head link

[Ocfs2-users] Slow on open()

Hi Somsak,

I observed high loads and apache slowness when there was insufficient contiguous
free space due to fragmentation. I believe it was because apache couldn't
write it's log files efficiently. We had 2 apache nodes and I found that
stopping apache on the problem node resolved the problem until I deleted lots of
unused files.

My symptoms don't seem to suggest a slow open() syscall like your strace
results are showing, but I certainly got the high load and poor apache
performance. It might be worth checking out anyway. There is a bug report and
we're just waiting for the patch to get reviewed and made publicly
available.

http://oss.oracle.com/bugzilla/show_bug.cgi?id=1189

Cheers,

Brad



On Tue, 19 Jan 2010 16:12:07 +0700
Somsak Sriprayoonsakul <somsaks at gmail.com> wrote:
> Hello,
> 
> We are using OCFS2 version 1.4.3 on CentOS5, x86_64 with 8GB memory. The
> underlying storage is HP 2312fc smart array equipped with 12 SAS 15K rpm,
> configured as RAID10 using 10 HDDs + 2 spares. The array has about 4GB
> cache. Communication is 4Gbps FC, through HP StorageWorks 8/8 Base e-port
> SAN Switch. Right now we only have this machine connect to the SAN through
> switch, but we plan to add more machine to utilize this SAN system.
> 
> Our application is apache version 1.3.41, mostly serving static HTML file +
> few PHP. Note that, we have to downgrade to 1.3.41 due to our application
> requirement. Apache is configured on has 500 MaxClients.
> 
> The storage OCFS2 are formatted with mkfs.ocfs2 without any special option
> on. It run directly from multipath'ed SAN storage without LVM or
software
> RAID. We mount OCFS2 with noatime, commit=15, and data=writeback (as well
as
> heartbeat=local). Our cluster.conf is like this
> 
> cluster:
>     node_count = 1
>     name = mycluster
> 
> node:
>     ip_port = 7777
>     ip_address = 203.123.123.123
>     number = 1
>     name = mycluster.mydomain.com
>     cluster = mycluster
> 
> (NOTE: Some details are neglected here, such as hostname and IP address).
> 
> Periodically, we found that the file system work very slow. I think that it
> happened once every few minutes. When the file system slow, httpd process
> CPU utilization will goes much higher to about 50% or above. I tried to
> debug this slow by creating a small script that periodically do
> 
> strace -f dd if=/dev/zero of=/san/testfile bs=1k count=1
> 
> And time the speed of dd, usually dd will finish within subsecond, but
> periodically dd will be much slower to about 30-60 seconds. Strace output
> show this.
> 
>      0.000026 open("/san/testfile", O_WRONLY|O_CREAT|O_TRUNC,
0666) = 1
>     76.418696 rt_sigaction(SIGUSR1, NULL, {SIG_DFL, [], 0}, 8) = 0
> 
> So I presume that this mean the open system call is periodically very slow.
> I did about 5-10 tests which yield similar strace'd results (ranging
from
> just 5-7 seconds to 80 seconds).
> 
> So my question is, what could be the cause of this slowness? How could I
> debug this deeper? On which point should we optimize the file system?
> 
> We are in the process of purchasing and adding more web servers to the
> system and use reverse proxy to load balance between two servers. We just
> want to make sure that this will not make situation worst.-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url :
http://oss.oracle.com/pipermail/ocfs2-users/attachments/20100128/18a08c83/attachment.bin

Ocfs2 users - Jan 2010 - Slow on open()

[Ocfs2-users] Slow on open()

[Ocfs2-users] Slow on open()

[Ocfs2-users] Slow on open()