thr3ads.net - Lustre discuss - [Lustre-discuss] ldlm_locks memory usage crashes OSS [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Jérémie Dubois-Lacoste

2012-Sep-25 15:12 UTC

[Lustre-discuss] ldlm_locks memory usage crashes OSS

Hi All,

We have a problem with one of our OSS, crashing out of memory, on a
system that we recently re-install.  Our system uses two OSS with two
OST on each, running Lustre 2.1.3 on CentOS 6.3 with the kernel
2.6.32-220.17.1.el6_lustre.x86_64 (so, 64bits).

One of the OSS is getting low on memory until it provocs a kernel
panic. Checking with ''slabtop'', it comes from the memory usage
of
"ldlm_locks" that keeps growing forever (until it crashes).  The
growing rate is rather quick: close to 1Mb per second, so in ~1h it
takes it all.

It may be related with the following bug:
https://bugzilla.lustre.org/show_bug.cgi?id=19950
However, this was for lustre 1.6, so I''m not sure.

I tried rebooting, resyncronizing with the MDS afterwards, the same
happends again.  Now that I check the other OSS (the one that is ok)
carefully, the same seems to happen but at a very slow growing
rate. Not sure yet.

This may be a consequence, or related anyhow:
We are using Lustre on a computing cluster with Sun Grid Engine 6.2u5,
and any jobs we submit takes a *HUGE* amount of memory compare to what
it was needing before our upgrade (and what it takes if we run it
directly, not through SGE). If the measure we get from SGE are
correct, the difference can be up to x1000: many jobs then get killed.
Sorry if it is not the proper place to post this, but I have the
intuition that this
could be related and some people here could be used to this combination
Lustre+SGE.

Any suggestion welcome!

Thanks,

    J?r?mie

Jérémie Dubois-Lacoste

2012-Sep-26 10:14 UTC

head link

[Lustre-discuss] ldlm_locks memory usage crashes OSS

So I still have no idea what was the cause of this, but we shut down the
entire cluster, rebooted the head node (with the MDS) twice, and things
were working fine again. Good old method!
Maybe something was wrong with one or several clients stuck somehow
in a lock. If it happens again I''ll post it.
Thanks anyway!

J?r?mie


2012/9/25 J?r?mie Dubois-Lacoste <jeremie.dl at
gmail.com>:> Hi All,
>
> We have a problem with one of our OSS, crashing out of memory, on a
> system that we recently re-install.  Our system uses two OSS with two
> OST on each, running Lustre 2.1.3 on CentOS 6.3 with the kernel
> 2.6.32-220.17.1.el6_lustre.x86_64 (so, 64bits).
>
> One of the OSS is getting low on memory until it provocs a kernel
> panic. Checking with ''slabtop'', it comes from the memory
usage of
> "ldlm_locks" that keeps growing forever (until it crashes).  The
> growing rate is rather quick: close to 1Mb per second, so in ~1h it
> takes it all.
>
> It may be related with the following bug:
> https://bugzilla.lustre.org/show_bug.cgi?id=19950
> However, this was for lustre 1.6, so I''m not sure.
>
> I tried rebooting, resyncronizing with the MDS afterwards, the same
> happends again.  Now that I check the other OSS (the one that is ok)
> carefully, the same seems to happen but at a very slow growing
> rate. Not sure yet.
>
> This may be a consequence, or related anyhow:
> We are using Lustre on a computing cluster with Sun Grid Engine 6.2u5,
> and any jobs we submit takes a *HUGE* amount of memory compare to what
> it was needing before our upgrade (and what it takes if we run it
> directly, not through SGE). If the measure we get from SGE are
> correct, the difference can be up to x1000: many jobs then get killed.
> Sorry if it is not the proper place to post this, but I have the
> intuition that this
> could be related and some people here could be used to this combination
> Lustre+SGE.
>
> Any suggestion welcome!
>
> Thanks,
>
>     J?r?mie

Mohr Jr, Richard Frank (Rick Mohr)

2012-Sep-26 14:15 UTC

head link

[Lustre-discuss] ldlm_locks memory usage crashes OSS

How many clients are using your file system?  We had an issue at one point with
our MDS running out of memory due to large numbers of locks. I did some digging
and found that each client was set to cache 1200 locks (100 per core), and the
lifespan for the cached locks was very long (although I can''t remember
the exact value).  A quick calculation showed that based on our machine size,
the MDS did not have enough memory to support this many locks.  These settings
had been in use for years, but we never had a problem until a user ran a very
large job which opened/closed thousands of files per node.  The number of cached
locks would slowly build until the MDS OOM''ed.  We ended up reducing
the number of cached locks per client and this resolved the issue.

I don''t know if this could be the same type of problem affecting your
system, but I thought I would share the details in case it was relevant.

-- 
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

On Sep 26, 2012, at 6:14 AM, J?r?mie Dubois-Lacoste wrote:
> So I still have no idea what was the cause of this, but we shut down the
> entire cluster, rebooted the head node (with the MDS) twice, and things
> were working fine again. Good old method!
> Maybe something was wrong with one or several clients stuck somehow
> in a lock. If it happens again I''ll post it.
> Thanks anyway!
> 
> J?r?mie
> 
> 
> 2012/9/25 J?r?mie Dubois-Lacoste <jeremie.dl at gmail.com>:
>> Hi All,
>> 
>> We have a problem with one of our OSS, crashing out of memory, on a
>> system that we recently re-install.  Our system uses two OSS with two
>> OST on each, running Lustre 2.1.3 on CentOS 6.3 with the kernel
>> 2.6.32-220.17.1.el6_lustre.x86_64 (so, 64bits).
>> 
>> One of the OSS is getting low on memory until it provocs a kernel
>> panic. Checking with ''slabtop'', it comes from the
memory usage of
>> "ldlm_locks" that keeps growing forever (until it crashes). 
The
>> growing rate is rather quick: close to 1Mb per second, so in ~1h it
>> takes it all.
>> 
>> It may be related with the following bug:
>> https://bugzilla.lustre.org/show_bug.cgi?id=19950
>> However, this was for lustre 1.6, so I''m not sure.
>> 
>> I tried rebooting, resyncronizing with the MDS afterwards, the same
>> happends again.  Now that I check the other OSS (the one that is ok)
>> carefully, the same seems to happen but at a very slow growing
>> rate. Not sure yet.
>> 
>> This may be a consequence, or related anyhow:
>> We are using Lustre on a computing cluster with Sun Grid Engine 6.2u5,
>> and any jobs we submit takes a *HUGE* amount of memory compare to what
>> it was needing before our upgrade (and what it takes if we run it
>> directly, not through SGE). If the measure we get from SGE are
>> correct, the difference can be up to x1000: many jobs then get killed.
>> Sorry if it is not the proper place to post this, but I have the
>> intuition that this
>> could be related and some people here could be used to this combination
>> Lustre+SGE.
>> 
>> Any suggestion welcome!
>> 
>> Thanks,
>> 
>>    J?r?mie
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Jérémie Dubois-Lacoste

2012-Sep-26 15:34 UTC

head link

[Lustre-discuss] ldlm_locks memory usage crashes OSS

Hi,

We have typically ~80 clients running.
In our case the locks were eating memory on the oss, not the mds, but
yes it might
still have the same cause.
I''m not sure how to find the lifespan and number of locks per node. I
think we''re
using the default settings. Do you know how to check this?

Cheers,

J?r?mie


2012/9/26 Mohr Jr, Richard Frank (Rick Mohr) <rmohr at
utk.edu>:> How many clients are using your file system?  We had an issue at one point
with our MDS running out of memory due to large numbers of locks. I did some
digging and found that each client was set to cache 1200 locks (100 per core),
and the lifespan for the cached locks was very long (although I can''t
remember the exact value).  A quick calculation showed that based on our machine
size, the MDS did not have enough memory to support this many locks.  These
settings had been in use for years, but we never had a problem until a user ran
a very large job which opened/closed thousands of files per node.  The number of
cached locks would slowly build until the MDS OOM''ed.  We ended up
reducing the number of cached locks per client and this resolved the issue.
>
> I don''t know if this could be the same type of problem affecting
your system, but I thought I would share the details in case it was relevant.
>
> --
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu
>
> On Sep 26, 2012, at 6:14 AM, J?r?mie Dubois-Lacoste wrote:
>
>> So I still have no idea what was the cause of this, but we shut down
the
>> entire cluster, rebooted the head node (with the MDS) twice, and things
>> were working fine again. Good old method!
>> Maybe something was wrong with one or several clients stuck somehow
>> in a lock. If it happens again I''ll post it.
>> Thanks anyway!
>>
>> J?r?mie
>>
>>
>> 2012/9/25 J?r?mie Dubois-Lacoste <jeremie.dl at gmail.com>:
>>> Hi All,
>>>
>>> We have a problem with one of our OSS, crashing out of memory, on a
>>> system that we recently re-install.  Our system uses two OSS with
two
>>> OST on each, running Lustre 2.1.3 on CentOS 6.3 with the kernel
>>> 2.6.32-220.17.1.el6_lustre.x86_64 (so, 64bits).
>>>
>>> One of the OSS is getting low on memory until it provocs a kernel
>>> panic. Checking with ''slabtop'', it comes from the
memory usage of
>>> "ldlm_locks" that keeps growing forever (until it
crashes).  The
>>> growing rate is rather quick: close to 1Mb per second, so in ~1h it
>>> takes it all.
>>>
>>> It may be related with the following bug:
>>> https://bugzilla.lustre.org/show_bug.cgi?id=19950
>>> However, this was for lustre 1.6, so I''m not sure.
>>>
>>> I tried rebooting, resyncronizing with the MDS afterwards, the same
>>> happends again.  Now that I check the other OSS (the one that is
ok)
>>> carefully, the same seems to happen but at a very slow growing
>>> rate. Not sure yet.
>>>
>>> This may be a consequence, or related anyhow:
>>> We are using Lustre on a computing cluster with Sun Grid Engine
6.2u5,
>>> and any jobs we submit takes a *HUGE* amount of memory compare to
what
>>> it was needing before our upgrade (and what it takes if we run it
>>> directly, not through SGE). If the measure we get from SGE are
>>> correct, the difference can be up to x1000: many jobs then get
killed.
>>> Sorry if it is not the proper place to post this, but I have the
>>> intuition that this
>>> could be related and some people here could be used to this
combination
>>> Lustre+SGE.
>>>
>>> Any suggestion welcome!
>>>
>>> Thanks,
>>>
>>>    J?r?mie
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>
>
>

Mohr Jr, Richard Frank (Rick Mohr)

2012-Sep-26 17:07 UTC

head link

[Lustre-discuss] ldlm_locks memory usage crashes OSS

On Sep 26, 2012, at 11:34 AM, J?r?mie Dubois-Lacoste wrote:
> We have typically ~80 clients running.
> In our case the locks were eating memory on the oss, not the mds, but
> yes it might
> still have the same cause.
Hmmm.  With only 80 clients I wouldn''t expect that locks would use up
too much memory.  But the number of cached locks is controlled on a per target
basis, so if the OSS node has quite a few OSTs, the aggregate number of locks
could get large.
> I''m not sure how to find the lifespan and number of locks per
node. I
> think we''re using the default settings. Do you know how to check
this?
Take a look at /proc/fs/lustre/ldlm/namespaces/*/{lru_max_age,lru_size}. The
lru_max_age file shows the amount of time a lock will be cached before it ages
out.  The lru_size file shows the max number of locks that will be cached.

-- 
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

Jérémie Dubois-Lacoste

2012-Sep-26 17:15 UTC

head link

[Lustre-discuss] ldlm_locks memory usage crashes OSS

So the max number is 800, and the maximum age is 36000000 (don''t know
the unit, but it looks like an hour).
We have only two OST on each of our two OSS.
With 80 clients, do you think there is any chance these settings are too high?


2012/9/26 Mohr Jr, Richard Frank (Rick Mohr) <rmohr at
utk.edu>:>
> On Sep 26, 2012, at 11:34 AM, J?r?mie Dubois-Lacoste wrote:
>
>> We have typically ~80 clients running.
>> In our case the locks were eating memory on the oss, not the mds, but
>> yes it might
>> still have the same cause.
>
> Hmmm.  With only 80 clients I wouldn''t expect that locks would use
up too much memory.  But the number of cached locks is controlled on a per
target basis, so if the OSS node has quite a few OSTs, the aggregate number of
locks could get large.
>
>> I''m not sure how to find the lifespan and number of locks per
node. I
>> think we''re using the default settings. Do you know how to
check this?
>
> Take a look at /proc/fs/lustre/ldlm/namespaces/*/{lru_max_age,lru_size}.
The lru_max_age file shows the amount of time a lock will be cached before it
ages out.  The lru_size file shows the max number of locks that will be cached.
>
> --
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu
>
>

Mohr Jr, Richard Frank (Rick Mohr)

2012-Sep-26 19:01 UTC

head link

[Lustre-discuss] ldlm_locks memory usage crashes OSS

On Sep 26, 2012, at 1:15 PM, J?r?mie Dubois-Lacoste wrote:
> So the max number is 800, and the maximum age is 36000000 (don''t
know
> the unit, but it looks like an hour).
> We have only two OST on each of our two OSS.
> With 80 clients, do you think there is any chance these settings are too
high?
Based on that info, it doesn''t seem like you should be running into a
problem with too many locks.  If you see the issue again, I would suggest
gathering some info about lock usage from the clients.  The
/proc/fs/lustre/ldlm/namespaces/*/lock_count file should tell you how many locks
the client has for each target.  From that you should be able to get an idea of
how many locks are on the OSS and see if that would account for the memory
usage.

-- 
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

Jérémie Dubois-Lacoste

2012-Sep-26 21:59 UTC

head link

[Lustre-discuss] ldlm_locks memory usage crashes OSS

Ok, I''ll keep it in mind.
Thanks!

J?r?mie


2012/9/26 Mohr Jr, Richard Frank (Rick Mohr) <rmohr at
utk.edu>:> On Sep 26, 2012, at 1:15 PM, J?r?mie Dubois-Lacoste wrote:
>
>> So the max number is 800, and the maximum age is 36000000
(don''t know
>> the unit, but it looks like an hour).
>> We have only two OST on each of our two OSS.
>> With 80 clients, do you think there is any chance these settings are
too high?
>
> Based on that info, it doesn''t seem like you should be running
into a problem with too many locks.  If you see the issue again, I would suggest
gathering some info about lock usage from the clients.  The
/proc/fs/lustre/ldlm/namespaces/*/lock_count file should tell you how many locks
the client has for each target.  From that you should be able to get an idea of
how many locks are on the OSS and see if that would account for the memory
usage.
>
> --
> Rick Mohr
> Senior HPC System Administrator
> National Institute for Computational Sciences
> http://www.nics.tennessee.edu
>
>

Lustre discuss - Sep 2012 - ldlm_locks memory usage crashes OSS

[Lustre-discuss] ldlm_locks memory usage crashes OSS

[Lustre-discuss] ldlm_locks memory usage crashes OSS

[Lustre-discuss] ldlm_locks memory usage crashes OSS

[Lustre-discuss] ldlm_locks memory usage crashes OSS

[Lustre-discuss] ldlm_locks memory usage crashes OSS

[Lustre-discuss] ldlm_locks memory usage crashes OSS

[Lustre-discuss] ldlm_locks memory usage crashes OSS

[Lustre-discuss] ldlm_locks memory usage crashes OSS