thr3ads.net - Ocfs2 users - [Ocfs2-users] Extremely poor write performance, but read appears to be okay [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Daniel McDonald

2010-Dec-09 01:07 UTC

[Ocfs2-users] Extremely poor write performance, but read appears to be okay

Hello,

I'm writing from the otherside of the world from where my systems are,
so details are coming in slow. We have a 6TB OCFS2 volume across 20 or
so nodes all running OEL5.4 running ocfs2-1.4.4. The system has worked
fairly well for the last 6-8 months. Something has happened over the
last few weeks which has driven write performance nearly to a halt.
I'm not sure how to proceed, and very poor internet is hindering my
abilities further. I've verified that the disk array is in good
health. I'm seeing a few awkward kernel log messages, an example of
one follows. I have not been able to verify all nodes due to limited
time and slow internet in my present location. Any assistance would be
greatly appreciated. I should be able to provide log files in about 12
hours. At this moment, loadavgs on each node are 0.00 to 0.09.

Here is a test write and associated iostat -xm 5 output. Previously I
was obtaining > 90MB/s:

$ dd if=/dev/zero of=/home/testdump count=1000 bs=1024k

...and associated iostat output:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.10    0.00    0.43   12.25    0.00   87.22

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00     1.80  0.00  8.40     0.00     0.04     9.71
    0.01    0.64   0.05   0.04
sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda3              0.00     1.80  0.00  8.40     0.00     0.04     9.71
    0.01    0.64   0.05   0.04
sdc               0.00     0.00 115.80  0.60     0.46     0.00
8.04     0.99    8.48   8.47  98.54
sdc1              0.00     0.00 115.80  0.60     0.46     0.00
8.04     0.99    8.48   8.47  98.54

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.07    0.00    0.55   12.25    0.00   87.13

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00     0.40  0.00  0.80     0.00     0.00    12.00
    0.00    2.00   1.25   0.10
sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda3              0.00     0.40  0.00  0.80     0.00     0.00    12.00
    0.00    2.00   1.25   0.10
sdc               0.00     0.00 112.80  0.40     0.44     0.00
8.03     0.98    8.68   8.69  98.38
sdc1              0.00     0.00 112.80  0.40     0.44     0.00
8.03     0.98    8.68   8.69  98.38

Here is a test read and associated iostat output. I'm intentionally
reading from a different test file as to avoid caching effects:

$ dd if=/home/someothertestdump of=/dev/null bs=1024k

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.10    0.00    3.60   10.85    0.00   85.45

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00     3.79  0.00  1.40     0.00     0.02    29.71
    0.00    1.29   0.43   0.06
sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda3              0.00     3.79  0.00  1.40     0.00     0.02    29.71
    0.00    1.29   0.43   0.06
sdc               7.98     0.20 813.17  1.00   102.50     0.00
257.84     1.92    2.34   1.19  96.71
sdc1              7.98     0.20 813.17  1.00   102.50     0.00
257.84     1.92    2.34   1.19  96.67

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.07    0.00    3.67   10.22    0.00   86.03

Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda               0.00     0.20  0.00  0.40     0.00     0.00    12.00
    0.00    0.50   0.50   0.02
sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
    0.00    0.00   0.00   0.00
sda3              0.00     0.20  0.00  0.40     0.00     0.00    12.00
    0.00    0.50   0.50   0.02
sdc               6.60     0.20 829.00  1.00   104.28     0.00
257.32     1.90    2.31   1.17  97.28
sdc1              6.60     0.20 829.00  1.00   104.28     0.00
257.32     1.90    2.31   1.17  97.28

I'm seeing a few weird kernel messages, such as:

Dec  7 14:07:50 growler kernel:
(dlm_wq,4793,4):dlm_deref_lockres_worker:2344 ERROR:
84B7C6421A6C4280AB87F569035C5368:O0000000000000016296ce900000000: node
14 trying to drop ref but it is already dropped!
Dec  7 14:07:50 growler kernel: lockres:
O0000000000000016296ce900000000, owner=0, state=0
Dec  7 14:07:50 growler kernel:   last used: 0, refcnt: 6, on purge list: no
Dec  7 14:07:50 growler kernel:   on dirty list: no, on reco list: no,
migrating pending: no
Dec  7 14:07:50 growler kernel:   inflight locks: 0, asts reserved: 0
Dec  7 14:07:50 growler kernel:   refmap nodes: [ 21 ], inflight=0
Dec  7 14:07:50 growler kernel:   granted queue:
Dec  7 14:07:50 growler kernel:     type=3, conv=-1, node=21,
cookie=21:213370, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n),
pending=(conv=n,lock=n,cancel=n,unlock=n)
Dec  7 14:07:50 growler kernel:   converting queue:
Dec  7 14:07:50 growler kernel:   blocked queue:


Here is df output:

root at growler:~$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sda3            245695888  29469416 203544360  13% /
/dev/sda1               101086     15133     80734  16% /boot
tmpfs                 33005580         0  33005580   0% /dev/shm
/dev/sdc1            5857428444 5234400436 623028008  90% /home

 Thanks
-Daniel

Sunil Mushran

2010-Dec-09 01:49 UTC

head link

[Ocfs2-users] Extremely poor write performance, but read appears to be okay

http://oss.oracle.com/git/?p=ocfs2-1.4.git;a=commitdiff;h=1f667766cb67ed05b4d706aa82e8ad0b12eaae8b

That specific error has been addressed in the upcoming 1.4.8.

Attach the logs and all other info to a bugzilla.

On 12/08/2010 05:07 PM, Daniel McDonald wrote:> Hello,
>
> I'm writing from the otherside of the world from where my systems are,
> so details are coming in slow. We have a 6TB OCFS2 volume across 20 or
> so nodes all running OEL5.4 running ocfs2-1.4.4. The system has worked
> fairly well for the last 6-8 months. Something has happened over the
> last few weeks which has driven write performance nearly to a halt.
> I'm not sure how to proceed, and very poor internet is hindering my
> abilities further. I've verified that the disk array is in good
> health. I'm seeing a few awkward kernel log messages, an example of
> one follows. I have not been able to verify all nodes due to limited
> time and slow internet in my present location. Any assistance would be
> greatly appreciated. I should be able to provide log files in about 12
> hours. At this moment, loadavgs on each node are 0.00 to 0.09.
>
> Here is a test write and associated iostat -xm 5 output. Previously I
> was obtaining>  90MB/s:
>
> $ dd if=/dev/zero of=/home/testdump count=1000 bs=1024k
>
> ...and associated iostat output:
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>             0.10    0.00    0.43   12.25    0.00   87.22
>
> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0.00     1.80  0.00  8.40     0.00     0.04     9.71
>      0.01    0.64   0.05   0.04
> sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>      0.00    0.00   0.00   0.00
> sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>      0.00    0.00   0.00   0.00
> sda3              0.00     1.80  0.00  8.40     0.00     0.04     9.71
>      0.01    0.64   0.05   0.04
> sdc               0.00     0.00 115.80  0.60     0.46     0.00
> 8.04     0.99    8.48   8.47  98.54
> sdc1              0.00     0.00 115.80  0.60     0.46     0.00
> 8.04     0.99    8.48   8.47  98.54
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>             0.07    0.00    0.55   12.25    0.00   87.13
>
> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0.00     0.40  0.00  0.80     0.00     0.00    12.00
>      0.00    2.00   1.25   0.10
> sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>      0.00    0.00   0.00   0.00
> sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>      0.00    0.00   0.00   0.00
> sda3              0.00     0.40  0.00  0.80     0.00     0.00    12.00
>      0.00    2.00   1.25   0.10
> sdc               0.00     0.00 112.80  0.40     0.44     0.00
> 8.03     0.98    8.68   8.69  98.38
> sdc1              0.00     0.00 112.80  0.40     0.44     0.00
> 8.03     0.98    8.68   8.69  98.38
>
> Here is a test read and associated iostat output. I'm intentionally
> reading from a different test file as to avoid caching effects:
>
> $ dd if=/home/someothertestdump of=/dev/null bs=1024k
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>             0.10    0.00    3.60   10.85    0.00   85.45
>
> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0.00     3.79  0.00  1.40     0.00     0.02    29.71
>      0.00    1.29   0.43   0.06
> sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>      0.00    0.00   0.00   0.00
> sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>      0.00    0.00   0.00   0.00
> sda3              0.00     3.79  0.00  1.40     0.00     0.02    29.71
>      0.00    1.29   0.43   0.06
> sdc               7.98     0.20 813.17  1.00   102.50     0.00
> 257.84     1.92    2.34   1.19  96.71
> sdc1              7.98     0.20 813.17  1.00   102.50     0.00
> 257.84     1.92    2.34   1.19  96.67
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>             0.07    0.00    3.67   10.22    0.00   86.03
>
> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda               0.00     0.20  0.00  0.40     0.00     0.00    12.00
>      0.00    0.50   0.50   0.02
> sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>      0.00    0.00   0.00   0.00
> sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>      0.00    0.00   0.00   0.00
> sda3              0.00     0.20  0.00  0.40     0.00     0.00    12.00
>      0.00    0.50   0.50   0.02
> sdc               6.60     0.20 829.00  1.00   104.28     0.00
> 257.32     1.90    2.31   1.17  97.28
> sdc1              6.60     0.20 829.00  1.00   104.28     0.00
> 257.32     1.90    2.31   1.17  97.28
>
> I'm seeing a few weird kernel messages, such as:
>
> Dec  7 14:07:50 growler kernel:
> (dlm_wq,4793,4):dlm_deref_lockres_worker:2344 ERROR:
> 84B7C6421A6C4280AB87F569035C5368:O0000000000000016296ce900000000: node
> 14 trying to drop ref but it is already dropped!
> Dec  7 14:07:50 growler kernel: lockres:
> O0000000000000016296ce900000000, owner=0, state=0
> Dec  7 14:07:50 growler kernel:   last used: 0, refcnt: 6, on purge list:
no
> Dec  7 14:07:50 growler kernel:   on dirty list: no, on reco list: no,
> migrating pending: no
> Dec  7 14:07:50 growler kernel:   inflight locks: 0, asts reserved: 0
> Dec  7 14:07:50 growler kernel:   refmap nodes: [ 21 ], inflight=0
> Dec  7 14:07:50 growler kernel:   granted queue:
> Dec  7 14:07:50 growler kernel:     type=3, conv=-1, node=21,
> cookie=21:213370, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n),
> pending=(conv=n,lock=n,cancel=n,unlock=n)
> Dec  7 14:07:50 growler kernel:   converting queue:
> Dec  7 14:07:50 growler kernel:   blocked queue:
>
>
> Here is df output:
>
> root at growler:~$ df
> Filesystem           1K-blocks      Used Available Use% Mounted on
> /dev/sda3            245695888  29469416 203544360  13% /
> /dev/sda1               101086     15133     80734  16% /boot
> tmpfs                 33005580         0  33005580   0% /dev/shm
> /dev/sdc1            5857428444 5234400436 623028008  90% /home
>
>   Thanks
> -Daniel
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reasonably Related Threads

Search for more possibly parallel threads

Ocfs2 users - Dec 2010 - Extremely poor write performance, but read appears to be okay

[Ocfs2-users] Extremely poor write performance, but read appears to be okay

[Ocfs2-users] Extremely poor write performance, but read appears to be okay

Reasonably Related Threads