thr3ads.net - Ocfs2 users - [Ocfs2-users] Doubts about OCFS2 Performance [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Jeronimo Bezerra

2010-Jul-27 13:32 UTC

[Ocfs2-users] Doubts about OCFS2 Performance

Hello all,

I need some help to understand one situation about disk/OCFS 
performance. Let-me introduce my environment:

I use OCFS2 in a mail environment with almost 10k users, in a OCFS2 
partition of 2 TB (~1TB in use). A lot of low files, block size of 2Kb. 
It's a Debian Etch Linux, in a IBM Ds4500 Storage with QLA2340.

Since a few weeks ago, I noted a poor performance when I have a mail to 
all users (all-l), mainly when this e-mail has more than 80Kb (yes, I 
know, It shouldn't happen, but here we have friendly fire! ). This 
situation is new, because this environment has almost 3 years. When this 
email 'appears' in my mail postfix queue, after some seconds, my load 
average goes to 100 -> 200 -> 300! Yesterday I paused the delivery of 
these emails in postfix (postsuper -h ALL) and after a one minute, the 
load average went to 2,31! One very strange thing is the mpstat  output 
in that moment of high load:

09:28:17     CPU   %user   %nice    %sys %iowait    %irq   %soft  
%steal   %idle    intr/s
09:28:18     all    7,05    0,00    2,59   11,12    0,00    0,22    
0,00   79,02   1788,12
09:28:18       0   37,62    0,00   15,84   41,58    0,00    3,96    
0,00    0,99   1790,10
09:28:18       1    2,97    0,00    4,95    5,94    0,00    0,00    
0,00   92,08      0,00
09:28:18       2    0,00    0,00    1,98    6,93    0,00    0,00    
0,00  112,87      0,00
09:28:18       3    0,00    0,00    0,99    4,95    0,00    0,00    
0,00  158,42      0,00
09:28:18       4    0,00    0,00    0,00    0,00    0,00    0,00    
0,00  100,99      0,00
09:28:18       5    0,99    0,00    1,98   31,68    0,00    0,00    
0,00   70,30      0,00
09:28:18       6    0,00    0,00    0,00    0,00    0,00    0,00    
0,00  185,15      0,00
09:28:18       7    0,00    0,00    0,00    0,00    0,00    0,00    
0,00   52,48      0,00
09:28:18       8   29,70    0,00    7,92   57,43    0,00    0,00    
0,00    6,93      0,00
09:28:18       9    2,97    0,00    5,94   43,56    0,00    0,00    
0,00   50,50      0,00
09:28:18      10   47,52    0,00    3,96    1,98    0,00    0,00    
0,00   54,46      0,00
09:28:18      11    0,00    0,00    0,00    3,96    0,00    0,00    
0,00   99,01      0,00
09:28:18      12    0,00    0,00    0,00    0,00    0,00    0,00    
0,00   99,01      0,00
09:28:18      13    3,96    0,00    1,98    0,00    0,00    0,00    
0,00   99,01      0,00
09:28:18      14    0,00    0,00    0,00    0,00    0,00    0,00    
0,00  138,61      0,00
09:28:18      15    0,00    0,00    0,00    1,98    0,00    0,00    
0,00   99,01      0,00

09:31:44     CPU   %user   %nice    %sys %iowait    %irq   %soft  
%steal   %idle    intr/s
09:31:45     all    1,10    0,00    2,88   11,22    0,00    0,25    
0,00   84,55   1811,76
09:31:45       0    6,86    0,00   13,73   69,61    0,00    3,92    
0,00    5,88   1810,78
09:31:45       1    0,98    0,00    2,94    2,94    0,00    0,00    
0,00   96,08      0,00
09:31:45       2    0,98    0,00    1,96    9,80    0,00    0,00    
0,00   90,20      0,00
09:31:45       3    0,00    0,00    1,96    1,96    0,00    0,00    
0,00   94,12      0,00
09:31:45       4    0,98    0,00    0,00    0,00    0,00    0,00    
0,00   99,02      0,00
09:31:45       5    0,00    0,00    0,98    0,98    0,00    0,00    
0,00   97,06      0,00
09:31:45       6    0,00    0,00    2,94    4,90    0,00    0,00    
0,00   95,10      0,00
09:31:45       7    0,00    0,00    1,96    9,80    0,00    0,00    
0,00   86,27      0,00
09:31:45       8    1,96    0,00    5,88   50,00    0,00    0,00    
0,00   41,18      0,00
09:31:45       9    1,96    0,00    0,98    0,98    0,00    0,00    
0,00   92,16      0,00
09:31:45      10    0,98    0,00    2,94    8,82    0,00    0,00    
0,00   84,31      0,00
09:31:45      11    2,94    0,00    1,96    1,96    0,00    0,00    
0,00   94,12      0,00
09:31:45      12    0,00    0,00    1,96    0,98    0,00    0,00    
0,00   97,06      0,00
09:31:45      13    0,00    0,00    1,96    0,98    0,00    0,00    
0,00   94,12      0,00
09:31:45      14    0,00    0,00    1,96    7,84    0,00    0,00    
0,00   95,10      0,00
09:31:45      15    0,00    0,00    0,98    7,84    0,00    0,00    
0,00   93,14      0,00

I don't understand why only one CPU (from the 16) is with 100% 
utilization in the moment of high load average, and why mpstat shows 
that only CPU 0 has almost all interruptions/s. By htop, just CPU 0 is 
in high utilization, and that's strange for me. In taht moment, the 
DS-4500 is normal, shows utilization from my mail host about 7-8 MB/s.

So, how could I do to discover why my server have this bottleneck? Any 
help would be appreciated.

Thank you,

Jeronimo Bezerra

Jeronimo Bezerra

2010-Jul-27 14:44 UTC

head link

[Ocfs2-users] Doubts about OCFS2 Performance

Thank you Aaron for the quickly answer. Below, my comments:

Em 27/07/2010 10:57, Aaron Thompson escreveu:> This looks like a disk issue - Contention, or wait time. This could be 
> a result of the time needed to write that 80k message to all users 
> mailboxes is throttling your disk connection or pushing some limit for 
> file size that moves the io into a larger set of blocks than smaller 
> messages would use. It looks and sounds like you may be waiting for 
> the disk to write those messages - I guess it depends on the size of 
> *all*.Ok. I guess it too, and I intend to increase the block size from 2 KB to 
4 KB and split my 2 TB partition in 4-5 partitions of 400 GB to share 
the load between the two main controllers from storage device. Do you 
think this is a good improvement or more overhead?

One doubt is: is this contention caused by Debian (and its IO/ocfs2 
manager) or by Storage device? I made some IO benchmarchs using Debian 
with OCFS and reached almost 100 MBps!! I know that the profile of 
benchmarch is different from mail environment (with a lot of small 
files), but...
>
> Your load is a function of more than CPU - your IO Wait is in there 
> somewhere also. I would suggest iostat, it may give you a better view 
> of which disk is doing how much work. I believe this is packaged with 
> a few other utilities as systat in debian (I've been on RHEL for a 
> while so make sure you check)Today I have the 2 TB partition spread over 20 FC disks in a Raid 5 
array. iostat didn't help so much:

Device:            tps      MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
dm-0           4637,25         6,99           2,07              7        
          2
dm-0           1491,18         2,91           0,00              2      
            0
dm-0           1535,51         2,58           0,41              2        
          0

Any other advice? Thanks again

Jeronimo
>
> Good Luck.
>
> @
>
> Aaron Thompson      Applications Administrator / Database Administrator
> http://www.uni.edu/~prefect/                University of Northern Iowa
>
> "All it takes to fly is to hurl yourself at the ground...  and
miss."
>                                                         -Douglas Adams
>
> On 07/27/10 08:32, Jeronimo Bezerra wrote:
>>    Hello all,
>>
>> I need some help to understand one situation about disk/OCFS
>> performance. Let-me introduce my environment:
>>
>> I use OCFS2 in a mail environment with almost 10k users, in a OCFS2
>> partition of 2 TB (~1TB in use). A lot of low files, block size of 2Kb.
>> It's a Debian Etch Linux, in a IBM Ds4500 Storage with QLA2340.
>>
>> Since a few weeks ago, I noted a poor performance when I have a mail to
>> all users (all-l), mainly when this e-mail has more than 80Kb (yes, I
>> know, It shouldn't happen, but here we have friendly fire! ). This
>> situation is new, because this environment has almost 3 years. When
this
>> email 'appears' in my mail postfix queue, after some seconds,
my load
>> average goes to 100 ->  200 ->  300! Yesterday I paused the
delivery of
>> these emails in postfix (postsuper -h ALL) and after a one minute, the
>> load average went to 2,31! One very strange thing is the mpstat  output
>> in that moment of high load:
>>
>> 09:28:17     CPU   %user   %nice    %sys %iowait    %irq   %soft
>> %steal   %idle    intr/s
>> 09:28:18     all    7,05    0,00    2,59   11,12    0,00    0,22
>> 0,00   79,02   1788,12
>> 09:28:18       0   37,62    0,00   15,84   41,58    0,00    3,96
>> 0,00    0,99   1790,10
>> 09:28:18       1    2,97    0,00    4,95    5,94    0,00    0,00
>> 0,00   92,08      0,00
>> 09:28:18       2    0,00    0,00    1,98    6,93    0,00    0,00
>> 0,00  112,87      0,00
>> 09:28:18       3    0,00    0,00    0,99    4,95    0,00    0,00
>> 0,00  158,42      0,00
>> 09:28:18       4    0,00    0,00    0,00    0,00    0,00    0,00
>> 0,00  100,99      0,00
>> 09:28:18       5    0,99    0,00    1,98   31,68    0,00    0,00
>> 0,00   70,30      0,00
>> 09:28:18       6    0,00    0,00    0,00    0,00    0,00    0,00
>> 0,00  185,15      0,00
>> 09:28:18       7    0,00    0,00    0,00    0,00    0,00    0,00
>> 0,00   52,48      0,00
>> 09:28:18       8   29,70    0,00    7,92   57,43    0,00    0,00
>> 0,00    6,93      0,00
>> 09:28:18       9    2,97    0,00    5,94   43,56    0,00    0,00
>> 0,00   50,50      0,00
>> 09:28:18      10   47,52    0,00    3,96    1,98    0,00    0,00
>> 0,00   54,46      0,00
>> 09:28:18      11    0,00    0,00    0,00    3,96    0,00    0,00
>> 0,00   99,01      0,00
>> 09:28:18      12    0,00    0,00    0,00    0,00    0,00    0,00
>> 0,00   99,01      0,00
>> 09:28:18      13    3,96    0,00    1,98    0,00    0,00    0,00
>> 0,00   99,01      0,00
>> 09:28:18      14    0,00    0,00    0,00    0,00    0,00    0,00
>> 0,00  138,61      0,00
>> 09:28:18      15    0,00    0,00    0,00    1,98    0,00    0,00
>> 0,00   99,01      0,00
>>
>> 09:31:44     CPU   %user   %nice    %sys %iowait    %irq   %soft
>> %steal   %idle    intr/s
>> 09:31:45     all    1,10    0,00    2,88   11,22    0,00    0,25
>> 0,00   84,55   1811,76
>> 09:31:45       0    6,86    0,00   13,73   69,61    0,00    3,92
>> 0,00    5,88   1810,78
>> 09:31:45       1    0,98    0,00    2,94    2,94    0,00    0,00
>> 0,00   96,08      0,00
>> 09:31:45       2    0,98    0,00    1,96    9,80    0,00    0,00
>> 0,00   90,20      0,00
>> 09:31:45       3    0,00    0,00    1,96    1,96    0,00    0,00
>> 0,00   94,12      0,00
>> 09:31:45       4    0,98    0,00    0,00    0,00    0,00    0,00
>> 0,00   99,02      0,00
>> 09:31:45       5    0,00    0,00    0,98    0,98    0,00    0,00
>> 0,00   97,06      0,00
>> 09:31:45       6    0,00    0,00    2,94    4,90    0,00    0,00
>> 0,00   95,10      0,00
>> 09:31:45       7    0,00    0,00    1,96    9,80    0,00    0,00
>> 0,00   86,27      0,00
>> 09:31:45       8    1,96    0,00    5,88   50,00    0,00    0,00
>> 0,00   41,18      0,00
>> 09:31:45       9    1,96    0,00    0,98    0,98    0,00    0,00
>> 0,00   92,16      0,00
>> 09:31:45      10    0,98    0,00    2,94    8,82    0,00    0,00
>> 0,00   84,31      0,00
>> 09:31:45      11    2,94    0,00    1,96    1,96    0,00    0,00
>> 0,00   94,12      0,00
>> 09:31:45      12    0,00    0,00    1,96    0,98    0,00    0,00
>> 0,00   97,06      0,00
>> 09:31:45      13    0,00    0,00    1,96    0,98    0,00    0,00
>> 0,00   94,12      0,00
>> 09:31:45      14    0,00    0,00    1,96    7,84    0,00    0,00
>> 0,00   95,10      0,00
>> 09:31:45      15    0,00    0,00    0,98    7,84    0,00    0,00
>> 0,00   93,14      0,00
>>
>> I don't understand why only one CPU (from the 16) is with 100%
>> utilization in the moment of high load average, and why mpstat shows
>> that only CPU 0 has almost all interruptions/s. By htop, just CPU 0 is
>> in high utilization, and that's strange for me. In taht moment, the
>> DS-4500 is normal, shows utilization from my mail host about 7-8 MB/s.
>>
>> So, how could I do to discover why my server have this bottleneck? Any
>> help would be appreciated.
>>
>> Thank you,
>>
>> Jeronimo Bezerra
>>
>>
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>

Ocfs2 users - Jul 2010 - Doubts about OCFS2 Performance

[Ocfs2-users] Doubts about OCFS2 Performance

[Ocfs2-users] Doubts about OCFS2 Performance