thr3ads.net - Lustre discuss - [Lustre-discuss] high OSS load

If this information is useful, please help other people find it:
Share via:

Thomas Roth

2011-May-05 17:39 UTC

[Lustre-discuss] high OSS load - readcache_max_filesize

Hi all,

a recent posting here (which I can''t find atm) has pointed me to 
http://jira.whamcloud.com/browse/LU-15, where an issue is discussed that 
we seem to see as well: some OSS really get overloaded, and the log says

slow journal start 36s due to heavy IO load
slow commitrw commit 36s due to heavy IO load
slow start_page_read 169s due to heavy IO load
slow direct_io 34s due to heavy IO load
...

The bugzilla discussion seems to propose a number of steps to go on each 
OSS as a workaround, among them setting
readcache_max_filesize=32M  or  readcache_max_filesize=0

I have checked the current value of this parameter and found
readcache_max_filesize=18446744073709551615
which translates to 16 EB (if I counted the powers of 1024 correctly).
Am I correct assuming that this is the default value, and that this 
default is meant to read "unlimited"? Or is our OSS configuration just
badly messed up?


Also, people recommend pinning the bitmaps to memory - how do you do that?

Preallocation tables all seem to contain "256 512 1024", so no
shrinking
of prealloc_table is necessary.
The OSTs in question have just reached the 85% level. We have a number 
of older OSS which are closer to 95% - I guess the problem doesn''t show
up there, because there is no room for further files anyhow...

Regards,
Thomas

Andreas Dilger

2011-May-07 07:08 UTC

head link

[Lustre-discuss] high OSS load - readcache_max_filesize

On 2011-05-05, at 11:39 AM, Thomas Roth <t.roth at gsi.de>
wrote:> a recent posting here (which I can''t find atm) has pointed me to 
> http://jira.whamcloud.com/browse/LU-15, where an issue is discussed that 
> we seem to see as well: some OSS really get overloaded, and the log says
> 
> slow journal start 36s due to heavy IO load
> slow commitrw commit 36s due to heavy IO load
> slow start_page_read 169s due to heavy IO load
> slow direct_io 34s due to heavy IO load
> ...
> 
> The bugzilla discussion seems to propose a number of steps to go on each 
> OSS as a workaround, among them setting
> readcache_max_filesize=32M  or  readcache_max_filesize=0
> 
> I have checked the current value of this parameter and found
> readcache_max_filesize=18446744073709551615
> which translates to 16 EB (if I counted the powers of 1024 correctly).
Right - this is 2^64 - 1.
> Am I correct assuming that this is the default value, and that this 
> default is meant to read "unlimited"?
Correct. 
> Or is our OSS configuration just 
> badly messed up?
> 
> 
> Also, people recommend pinning the bitmaps to memory - how do you do that?
There is no mechanism to do this today. It is possible to preload the bitmaps at
mount time, or I guess it may be possible to write a program that mapped the
bitmaps from the disk and then mlock''d that memory, but it would pin 32
MB of RAM per TB of filesystem. If an OSS has 8x 8TB OSTs that is 2GB of RAM. I
think there are more efficient solutions than this.
> Preallocation tables all seem to contain "256 512 1024", so no
shrinking
> of prealloc_table is necessary.
> The OSTs in question have just reached the 85% level. We have a number 
> of older OSS which are closer to 95% - I guess the problem doesn''t
show
> up there, because there is no room for further files anyhow...
This is considered very full for the filesystem, so it isn''t very
surprising that you are seeing such messages. In the future, the flex_bg option
will be usable for new filesystems, but that won''t help existing
filesystems today.

Cheers, Andreas

Lustre discuss - May 2011 - high OSS load - readcache_max_filesize

[Lustre-discuss] high OSS load - readcache_max_filesize

[Lustre-discuss] high OSS load - readcache_max_filesize