thr3ads.net - Lustre discuss - [Lustre-discuss] Memory (?) problem with 1.8.1 [Oct 2009]

If this information is useful, please help other people find it:
Share via:

David Simas

2009-Oct-13 00:06 UTC

[Lustre-discuss] Memory (?) problem with 1.8.1

Hello,

We have a Lustre 1.8.1 file system about 60 TB in size running on
RHEL 5 x86_64.  (I can provide hardware details if anyone thinks
they''d be relevant.)  We are seeing memory problems after several
days of sustained I/O into that file system.  We are writing from
a small number of clients (4 - 5) at an average rate of 50 MB/s, with
peaks of 350 MB/s.  We read all the data at least twice before deleting
them.  During this operation, we notice the value of "buffers"
reported in ''/proc/meminfo'' on the OSSs involved increasing
monotonically
until it apparently take up all the system''s memory - 32 GB.  Then
''kswapd''
starts consuming a large amount of CPU, the load increases (100+), and the
system, including Lustre, slows to crawl and becomes quite useless.  If we
stop Lustre I/O at this point, ''kswapd'' and the system load
calm down, but
the "buffers" value does not decrease.  Any I/O on the system will
then
(dd if=/dev/urandom of=/tmp/test ...) will cause ''kswapd'' to
run away
again.  We have observed the monotonically increasing "buffers"
condition
with non-Lustre I/O on systems running the Lustre 1.8.1 kernel
(2.6.18-128.1.14.el5_lustre.1.8.1), but we haven''t gotten them to point
where ''kswapd'' goes wild.

Has anyboy else seen anything like this?

David Simas
SLAC

David Singleton

2009-Oct-13 00:44 UTC

head link

[Lustre-discuss] Memory (?) problem with 1.8.1

Do you have OSS readcache on?

Check out
https://bugzilla.lustre.org/show_bug.cgi?id=20778
and
https://bugzilla.lustre.org/show_bug.cgi?id=18571

David

David Simas wrote:> Hello,
> 
> We have a Lustre 1.8.1 file system about 60 TB in size running on
> RHEL 5 x86_64.  (I can provide hardware details if anyone thinks
> they''d be relevant.)  We are seeing memory problems after several
> days of sustained I/O into that file system.  We are writing from
> a small number of clients (4 - 5) at an average rate of 50 MB/s, with
> peaks of 350 MB/s.  We read all the data at least twice before deleting
> them.  During this operation, we notice the value of "buffers"
> reported in ''/proc/meminfo'' on the OSSs involved
increasing monotonically
> until it apparently take up all the system''s memory - 32 GB.  Then
''kswapd''
> starts consuming a large amount of CPU, the load increases (100+), and the
> system, including Lustre, slows to crawl and becomes quite useless.  If we
> stop Lustre I/O at this point, ''kswapd'' and the system
load calm down, but
> the "buffers" value does not decrease.  Any I/O on the system
will then
> (dd if=/dev/urandom of=/tmp/test ...) will cause ''kswapd''
to run away
> again.  We have observed the monotonically increasing "buffers"
condition
> with non-Lustre I/O on systems running the Lustre 1.8.1 kernel
> (2.6.18-128.1.14.el5_lustre.1.8.1), but we haven''t gotten them to
point
> where ''kswapd'' goes wild.
> 
> Has anyboy else seen anything like this?
> 
> David Simas
> SLAC
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Brian J. Murrell

2009-Oct-13 11:58 UTC

head link

[Lustre-discuss] Memory (?) problem with 1.8.1

On Mon, 2009-10-12 at 17:06 -0700, David Simas wrote:> Hello,
Hi,
> During this operation, we notice the value of "buffers"
> reported in ''/proc/meminfo'' on the OSSs involved
increasing monotonically
> until it apparently take up all the system''s memory - 32 GB.
This would likely be OSS read cache, if you have it enabled.  If you do,
you should disable it due to a potential corruption issue.  Details were
given previously on this list on how to do that.  Check the archives.

But that''s not directly related to what you are seeing.

Having "buffers" consume all of available memory is SOP (standard
Operating Procedure) for Linux.  The philosophy is that "free"
(unused)
memory is wasted memory and as such, any memory not needed by
applications or other kernel processing is used to buffer disk I/O.  The
performance spiff of such is obvious I think.
> Then ''kswapd''
> starts consuming a large amount of CPU, the load increases (100+), and the
> system, including Lustre, slows to crawl and becomes quite useless.
This doesn''t sound normal or good.
> If we
> stop Lustre I/O at this point, ''kswapd'' and the system
load calm down, but
> the "buffers" value does not decrease.
Right.  The buffers will not get "emptied" until something else needs
the memory.  Again, unused memory is wasted memory.
> We have observed the monotonically increasing "buffers" condition
> with non-Lustre I/O on systems running the Lustre 1.8.1 kernel
> (2.6.18-128.1.14.el5_lustre.1.8.1),
Indeed.  The using up of memory by the buffer cache is a standard (i.e.
non-Lustre-specific) feature, and you will find the same thing on
non-Lustre kernels as well.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091013/bc1bf710/attachment.bin

Lundgren, Andrew

2009-Oct-13 14:54 UTC

head link

[Lustre-discuss] Memory (?) problem with 1.8.1

This sounds very much like a problem we saw before we changed the lru_size to a
fixed size from dynamic.

--
Andrew

-----Original Message-----
From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces
at lists.lustre.org] On Behalf Of David Simas
Sent: Monday, October 12, 2009 6:07 PM
To: lustre-discuss at lists.lustre.org
Subject: [Lustre-discuss] Memory (?) problem with 1.8.1


Hello,

We have a Lustre 1.8.1 file system about 60 TB in size running on
RHEL 5 x86_64.  (I can provide hardware details if anyone thinks
they''d be relevant.)  We are seeing memory problems after several
days of sustained I/O into that file system.  We are writing from
a small number of clients (4 - 5) at an average rate of 50 MB/s, with
peaks of 350 MB/s.  We read all the data at least twice before deleting
them.  During this operation, we notice the value of "buffers"
reported in ''/proc/meminfo'' on the OSSs involved increasing
monotonically
until it apparently take up all the system''s memory - 32 GB.  Then
''kswapd''
starts consuming a large amount of CPU, the load increases (100+), and the
system, including Lustre, slows to crawl and becomes quite useless.  If we
stop Lustre I/O at this point, ''kswapd'' and the system load
calm down, but
the "buffers" value does not decrease.  Any I/O on the system will
then
(dd if=/dev/urandom of=/tmp/test ...) will cause ''kswapd'' to
run away
again.  We have observed the monotonically increasing "buffers"
condition
with non-Lustre I/O on systems running the Lustre 1.8.1 kernel
(2.6.18-128.1.14.el5_lustre.1.8.1), but we haven''t gotten them to point
where ''kswapd'' goes wild.

Has anyboy else seen anything like this?

David Simas
SLAC

_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Nirmal Seenu

2009-Oct-13 14:56 UTC

head link

[Lustre-discuss] Memory (?) problem with 1.8.1

We face this problem on the Lustre servers on our cluster with GigE 
network. We found that increasing the following value in 
/etc/sysctl.conf forces the kswapd to kick in a lot earlier and prevent 
the scenario that you are talking about. Our servers have only 8GB 
memory, you might want to bump it up to 2GB or even 4GB with 32GB system 
memory.

# Control the min_free_kbytes
vm.min_free_kbytes = 1048576

Hope this helps.
Nirmal

Andreas Dilger

2009-Oct-13 21:20 UTC

head link

[Lustre-discuss] Memory (?) problem with 1.8.1

On 13-Oct-09, at 04:58, Brian J. Murrell wrote:> On Mon, 2009-10-12 at 17:06 -0700, David Simas wrote:
>> During this operation, we notice the value of "buffers"
>> reported in ''/proc/meminfo'' on the OSSs involved
increasing
>> monotonically
>> until it apparently take up all the system''s memory - 32 GB.
>
> This would likely be OSS read cache, if you have it enabled.  If you  
> do,
> you should disable it due to a potential corruption issue.  Details  
> were
> given previously on this list on how to do that.  Check the archives.
>
> But that''s not directly related to what you are seeing.
>
> Having "buffers" consume all of available memory is SOP (standard
> Operating Procedure) for Linux.  The philosophy is that  
> "free" (unused)
> memory is wasted memory and as such, any memory not needed by
> applications or other kernel processing is used to buffer disk I/O.   
> The
> performance spiff of such is obvious I think.
>
>> Then ''kswapd''
>> starts consuming a large amount of CPU, the load increases (100+),  
>> and the
>> system, including Lustre, slows to crawl and becomes quite useless.
>
> This doesn''t sound normal or good.
There is a recent bug for memory pressure on the OSS.  I believe it was
fixed for the next release.

>> If we
>> stop Lustre I/O at this point, ''kswapd'' and the
system load calm
>> down, but
>> the "buffers" value does not decrease.
>
> Right.  The buffers will not get "emptied" until something else
needs
> the memory.  Again, unused memory is wasted memory.
>
>> We have observed the monotonically increasing "buffers"
condition
>> with non-Lustre I/O on systems running the Lustre 1.8.1 kernel
>> (2.6.18-128.1.14.el5_lustre.1.8.1),
>
> Indeed.  The using up of memory by the buffer cache is a standard  
> (i.e.
> non-Lustre-specific) feature, and you will find the same thing on
> non-Lustre kernels as well.
>
> b.
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Oct 2009 - Memory (?) problem with 1.8.1

[Lustre-discuss] Memory (?) problem with 1.8.1

[Lustre-discuss] Memory (?) problem with 1.8.1

[Lustre-discuss] Memory (?) problem with 1.8.1

[Lustre-discuss] Memory (?) problem with 1.8.1

[Lustre-discuss] Memory (?) problem with 1.8.1

[Lustre-discuss] Memory (?) problem with 1.8.1