Hi all, a recent posting here (which I can''t find atm) has pointed me to http://jira.whamcloud.com/browse/LU-15, where an issue is discussed that we seem to see as well: some OSS really get overloaded, and the log says slow journal start 36s due to heavy IO load slow commitrw commit 36s due to heavy IO load slow start_page_read 169s due to heavy IO load slow direct_io 34s due to heavy IO load ... The bugzilla discussion seems to propose a number of steps to go on each OSS as a workaround, among them setting readcache_max_filesize=32M or readcache_max_filesize=0 I have checked the current value of this parameter and found readcache_max_filesize=18446744073709551615 which translates to 16 EB (if I counted the powers of 1024 correctly). Am I correct assuming that this is the default value, and that this default is meant to read "unlimited"? Or is our OSS configuration just badly messed up? Also, people recommend pinning the bitmaps to memory - how do you do that? Preallocation tables all seem to contain "256 512 1024", so no shrinking of prealloc_table is necessary. The OSTs in question have just reached the 85% level. We have a number of older OSS which are closer to 95% - I guess the problem doesn''t show up there, because there is no room for further files anyhow... Regards, Thomas
Andreas Dilger
2011-May-07 07:08 UTC
[Lustre-discuss] high OSS load - readcache_max_filesize
On 2011-05-05, at 11:39 AM, Thomas Roth <t.roth at gsi.de> wrote:> a recent posting here (which I can''t find atm) has pointed me to > http://jira.whamcloud.com/browse/LU-15, where an issue is discussed that > we seem to see as well: some OSS really get overloaded, and the log says > > slow journal start 36s due to heavy IO load > slow commitrw commit 36s due to heavy IO load > slow start_page_read 169s due to heavy IO load > slow direct_io 34s due to heavy IO load > ... > > The bugzilla discussion seems to propose a number of steps to go on each > OSS as a workaround, among them setting > readcache_max_filesize=32M or readcache_max_filesize=0 > > I have checked the current value of this parameter and found > readcache_max_filesize=18446744073709551615 > which translates to 16 EB (if I counted the powers of 1024 correctly).Right - this is 2^64 - 1.> Am I correct assuming that this is the default value, and that this > default is meant to read "unlimited"?Correct.> Or is our OSS configuration just > badly messed up? > > > Also, people recommend pinning the bitmaps to memory - how do you do that?There is no mechanism to do this today. It is possible to preload the bitmaps at mount time, or I guess it may be possible to write a program that mapped the bitmaps from the disk and then mlock''d that memory, but it would pin 32 MB of RAM per TB of filesystem. If an OSS has 8x 8TB OSTs that is 2GB of RAM. I think there are more efficient solutions than this.> Preallocation tables all seem to contain "256 512 1024", so no shrinking > of prealloc_table is necessary. > The OSTs in question have just reached the 85% level. We have a number > of older OSS which are closer to 95% - I guess the problem doesn''t show > up there, because there is no room for further files anyhow...This is considered very full for the filesystem, so it isn''t very surprising that you are seeing such messages. In the future, the flex_bg option will be usable for new filesystems, but that won''t help existing filesystems today. Cheers, Andreas