thr3ads.net - Lustre discuss - [Lustre-discuss] Limits on OSTs per OSS? [Aug 2009]

If this information is useful, please help other people find it:
Share via:

Ms. Megan Larko

2009-Aug-19 14:28 UTC

[Lustre-discuss] Limits on OSTs per OSS?

Responding to what Sebastien has written:>Hi,
>Just a small feedback from our own experience.
>I agree with Brian about the fact that there is no strong limit on the
>number of OSTs per OSS in the Lustre code. But one should really take
>into account the available memory on OSSes when defining the number of
>OSTs per OSS (and so the size of each OST). If you do not have 1GB or
>1.2 GB of memory per OST on your OSSes, you will run into serioust>rouble with "out of memory" messages.
>For instance, if you want 8 OSTs per OSS, your OSSes should have at
>least 10GB of RAM.
>Unfortunately we experienced those "out of memory" problems, so I
advise
>you to read Lustre Operations Manual chapter 33.12 "OSS RAM Size for a
>Single OST".
>Cheers,
>Sebastien.
We have one OSS running Lustre  2.6.18-53.1.13.el5_lustre.1.6.4.3smp.
 This OSS has 16Gb  RAM for 76Tb of formatted Lustre disk space.

[root at oss4 ~]# cat /proc/meminfo
MemTotal:     16439360 kB
MemFree:         88204 kB

Client sees: ic-mds1 at o2ib:/crew8   Total Usable Space 76Tb

The OSS has 6 JBODS, each of which is partitioned in two parts to stay
below the Lustre 8Tb per partition limit.
/dev/sdb1             6.3T  3.8T  2.3T  63% /srv/lustre/OST/crew8-OST0000
/dev/sdb2             6.3T  3.7T  2.3T  62% /srv/lustre/OST/crew8-OST0001
/dev/sdc1             6.3T  3.8T  2.3T  63% /srv/lustre/OST/crew8-OST0002
/dev/sdc2             6.3T  3.8T  2.2T  64% /srv/lustre/OST/crew8-OST0003
/dev/sdd1             6.3T  3.8T  2.2T  64% /srv/lustre/OST/crew8-OST0004
/dev/sdd2             6.3T  4.2T  1.8T  70% /srv/lustre/OST/crew8-OST0005
/dev/sdi1             6.3T  4.3T  1.8T  71% /srv/lustre/OST/crew8-OST0006
/dev/sdi2             6.3T  3.8T  2.2T  64% /srv/lustre/OST/crew8-OST0007
/dev/sdj1             6.3T  3.8T  2.3T  63% /srv/lustre/OST/crew8-OST0008
/dev/sdj2             6.3T  3.8T  2.2T  63% /srv/lustre/OST/crew8-OST0009
/dev/sdk1             6.3T  3.7T  2.3T  62% /srv/lustre/OST/crew8-OST0010
/dev/sdk2             6.3T  3.7T  2.3T  63% /srv/lustre/OST/crew8-OST0011

As you can see, this is no where near the recommendation of 1Gb of RAM
per OST.  Yes, we do occasionally, under load, see kernel panics due
to, we believe, insufficient memory and swap.   These panics occur
approximately once per month.   We also see watchdog messages stating
"swap page allocation failure" messages sometimes a day prior to
kernel panic.  After this Lustre disk was up and running was I then
enlightened that this was too much load for a single OSS.   Ah well,
live and learn.   I am planning to split this one large group across
two OSSes in the next month.   Hopefully the kernel panics and
watchdog errors will go away with the disk OST load shared across two
OSS machines.

Just one real life scenario for your consideration.

megan

Sébastien Buisson

2009-Aug-19 14:39 UTC

head link

[Lustre-discuss] Limits on OSTs per OSS?

Hi,

To me:
12 OSTs x 1.2 GB = 14.4 GB < 16GB

So you are clearly in the recommendation.

Cheers,
Sebastien.


Ms. Megan Larko a ?crit :> Responding to what Sebastien has written:
>> Hi,
> 
>> Just a small feedback from our own experience.
>> I agree with Brian about the fact that there is no strong limit on the
>> number of OSTs per OSS in the Lustre code. But one should really take
>> into account the available memory on OSSes when defining the number of
>> OSTs per OSS (and so the size of each OST). If you do not have 1GB or
>> 1.2 GB of memory per OST on your OSSes, you will run into serious
> t>rouble with "out of memory" messages.
> 
>> For instance, if you want 8 OSTs per OSS, your OSSes should have at
>> least 10GB of RAM.
> 
>> Unfortunately we experienced those "out of memory" problems,
so I advise
>> you to read Lustre Operations Manual chapter 33.12 "OSS RAM Size
for a
>> Single OST".
> 
>> Cheers,
>> Sebastien.
> 
> We have one OSS running Lustre  2.6.18-53.1.13.el5_lustre.1.6.4.3smp.
>  This OSS has 16Gb  RAM for 76Tb of formatted Lustre disk space.
> 
> [root at oss4 ~]# cat /proc/meminfo
> MemTotal:     16439360 kB
> MemFree:         88204 kB
> 
> Client sees: ic-mds1 at o2ib:/crew8   Total Usable Space 76Tb
> 
> The OSS has 6 JBODS, each of which is partitioned in two parts to stay
> below the Lustre 8Tb per partition limit.
> /dev/sdb1             6.3T  3.8T  2.3T  63% /srv/lustre/OST/crew8-OST0000
> /dev/sdb2             6.3T  3.7T  2.3T  62% /srv/lustre/OST/crew8-OST0001
> /dev/sdc1             6.3T  3.8T  2.3T  63% /srv/lustre/OST/crew8-OST0002
> /dev/sdc2             6.3T  3.8T  2.2T  64% /srv/lustre/OST/crew8-OST0003
> /dev/sdd1             6.3T  3.8T  2.2T  64% /srv/lustre/OST/crew8-OST0004
> /dev/sdd2             6.3T  4.2T  1.8T  70% /srv/lustre/OST/crew8-OST0005
> /dev/sdi1             6.3T  4.3T  1.8T  71% /srv/lustre/OST/crew8-OST0006
> /dev/sdi2             6.3T  3.8T  2.2T  64% /srv/lustre/OST/crew8-OST0007
> /dev/sdj1             6.3T  3.8T  2.3T  63% /srv/lustre/OST/crew8-OST0008
> /dev/sdj2             6.3T  3.8T  2.2T  63% /srv/lustre/OST/crew8-OST0009
> /dev/sdk1             6.3T  3.7T  2.3T  62% /srv/lustre/OST/crew8-OST0010
> /dev/sdk2             6.3T  3.7T  2.3T  63% /srv/lustre/OST/crew8-OST0011
> 
> As you can see, this is no where near the recommendation of 1Gb of RAM
> per OST.  Yes, we do occasionally, under load, see kernel panics due
> to, we believe, insufficient memory and swap.   These panics occur
> approximately once per month.   We also see watchdog messages stating
> "swap page allocation failure" messages sometimes a day prior to
> kernel panic.  After this Lustre disk was up and running was I then
> enlightened that this was too much load for a single OSS.   Ah well,
> live and learn.   I am planning to split this one large group across
> two OSSes in the next month.   Hopefully the kernel panics and
> watchdog errors will go away with the disk OST load shared across two
> OSS machines.
> 
> Just one real life scenario for your consideration.
> 
> megan
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
>

Ms. Megan Larko

2009-Aug-19 14:44 UTC

head link

[Lustre-discuss] Limits on OSTs per OSS?

Greetings Sebastien,

On Wed, Aug 19, 2009 at 10:39 AM, S?bastien
Buisson<sebastien.buisson at bull.net> wrote:> Hi,
>
> To me:
> 12 OSTs x 1.2 GB = 14.4 GB < 16GB
>
> So you are clearly in the recommendation.
I thought I would be with in the spec *if* my OSTs were smaller units.
  As they are JBODs in sections of 6+ Tb each, I though I was
"coloring outside the lines".

Thanks,
megan
>
> Cheers,
> Sebastien.
>
>
> Ms. Megan Larko a ?crit :
>>
>> Responding to what Sebastien has written:
>>>
>>> Hi,
>>
>>> Just a small feedback from our own experience.
>>> I agree with Brian about the fact that there is no strong limit on
the
>>> number of OSTs per OSS in the Lustre code. But one should really
take
>>> into account the available memory on OSSes when defining the number
of
>>> OSTs per OSS (and so the size of each OST). If you do not have 1GB
or
>>> 1.2 GB of memory per OST on your OSSes, you will run into serious
>>
>> t>rouble with "out of memory" messages.
>>
>>> For instance, if you want 8 OSTs per OSS, your OSSes should have at
>>> least 10GB of RAM.
>>
>>> Unfortunately we experienced those "out of memory"
problems, so I advise
>>> you to read Lustre Operations Manual chapter 33.12 "OSS RAM
Size for a
>>> Single OST".
>>
>>> Cheers,
>>> Sebastien.
>>
>> We have one OSS running Lustre ?2.6.18-53.1.13.el5_lustre.1.6.4.3smp.
>> ?This OSS has 16Gb ?RAM for 76Tb of formatted Lustre disk space.
>>
>> [root at oss4 ~]# cat /proc/meminfo
>> MemTotal: ? ? 16439360 kB
>> MemFree: ? ? ? ? 88204 kB
>>
>> Client sees: ic-mds1 at o2ib:/crew8 ? Total Usable Space 76Tb
>>
>> The OSS has 6 JBODS, each of which is partitioned in two parts to stay
>> below the Lustre 8Tb per partition limit.
>> /dev/sdb1 ? ? ? ? ? ? 6.3T ?3.8T ?2.3T ?63%
/srv/lustre/OST/crew8-OST0000
>> /dev/sdb2 ? ? ? ? ? ? 6.3T ?3.7T ?2.3T ?62%
/srv/lustre/OST/crew8-OST0001
>> /dev/sdc1 ? ? ? ? ? ? 6.3T ?3.8T ?2.3T ?63%
/srv/lustre/OST/crew8-OST0002
>> /dev/sdc2 ? ? ? ? ? ? 6.3T ?3.8T ?2.2T ?64%
/srv/lustre/OST/crew8-OST0003
>> /dev/sdd1 ? ? ? ? ? ? 6.3T ?3.8T ?2.2T ?64%
/srv/lustre/OST/crew8-OST0004
>> /dev/sdd2 ? ? ? ? ? ? 6.3T ?4.2T ?1.8T ?70%
/srv/lustre/OST/crew8-OST0005
>> /dev/sdi1 ? ? ? ? ? ? 6.3T ?4.3T ?1.8T ?71%
/srv/lustre/OST/crew8-OST0006
>> /dev/sdi2 ? ? ? ? ? ? 6.3T ?3.8T ?2.2T ?64%
/srv/lustre/OST/crew8-OST0007
>> /dev/sdj1 ? ? ? ? ? ? 6.3T ?3.8T ?2.3T ?63%
/srv/lustre/OST/crew8-OST0008
>> /dev/sdj2 ? ? ? ? ? ? 6.3T ?3.8T ?2.2T ?63%
/srv/lustre/OST/crew8-OST0009
>> /dev/sdk1 ? ? ? ? ? ? 6.3T ?3.7T ?2.3T ?62%
/srv/lustre/OST/crew8-OST0010
>> /dev/sdk2 ? ? ? ? ? ? 6.3T ?3.7T ?2.3T ?63%
/srv/lustre/OST/crew8-OST0011
>>
>> As you can see, this is no where near the recommendation of 1Gb of RAM
>> per OST. ?Yes, we do occasionally, under load, see kernel panics due
>> to, we believe, insufficient memory and swap. ? These panics occur
>> approximately once per month. ? We also see watchdog messages stating
>> "swap page allocation failure" messages sometimes a day prior
to
>> kernel panic. ?After this Lustre disk was up and running was I then
>> enlightened that this was too much load for a single OSS. ? Ah well,
>> live and learn. ? I am planning to split this one large group across
>> two OSSes in the next month. ? Hopefully the kernel panics and
>> watchdog errors will go away with the disk OST load shared across two
>> OSS machines.
>>
>> Just one real life scenario for your consideration.
>>
>> megan
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
>

Andreas Dilger

2009-Aug-20 02:20 UTC

head link

[Lustre-discuss] Limits on OSTs per OSS?

On Aug 19, 2009  10:28 -0400, Ms. Megan Larko wrote:> We have one OSS running Lustre  2.6.18-53.1.13.el5_lustre.1.6.4.3smp.
>  This OSS has 16Gb  RAM for 76Tb of formatted Lustre disk space.
> 
> Client sees: ic-mds1 at o2ib:/crew8   Total Usable Space 76Tb
> 
> The OSS has 6 JBODS, each of which is partitioned in two parts to stay
> below the Lustre 8Tb per partition limit.
> /dev/sdb1             6.3T  3.8T  2.3T  63% /srv/lustre/OST/crew8-OST0000
> /dev/sdb2             6.3T  3.7T  2.3T  62% /srv/lustre/OST/crew8-OST0001
> /dev/sdc1             6.3T  3.8T  2.3T  63% /srv/lustre/OST/crew8-OST0002
> /dev/sdc2             6.3T  3.8T  2.2T  64% /srv/lustre/OST/crew8-OST0003
> /dev/sdd1             6.3T  3.8T  2.2T  64% /srv/lustre/OST/crew8-OST0004
> /dev/sdd2             6.3T  4.2T  1.8T  70% /srv/lustre/OST/crew8-OST0005
> /dev/sdi1             6.3T  4.3T  1.8T  71% /srv/lustre/OST/crew8-OST0006
> /dev/sdi2             6.3T  3.8T  2.2T  64% /srv/lustre/OST/crew8-OST0007
> /dev/sdj1             6.3T  3.8T  2.3T  63% /srv/lustre/OST/crew8-OST0008
> /dev/sdj2             6.3T  3.8T  2.2T  63% /srv/lustre/OST/crew8-OST0009
> /dev/sdk1             6.3T  3.7T  2.3T  62% /srv/lustre/OST/crew8-OST0010
> /dev/sdk2             6.3T  3.7T  2.3T  63% /srv/lustre/OST/crew8-OST0011
> 
> As you can see, this is no where near the recommendation of 1Gb of RAM
> per OST.  Yes, we do occasionally, under load, see kernel panics due
> to, we believe, insufficient memory and swap.   These panics occur
> approximately once per month.   We also see watchdog messages stating
> "swap page allocation failure" messages sometimes a day prior to
> kernel panic.  After this Lustre disk was up and running was I then
> enlightened that this was too much load for a single OSS.   Ah well,
> live and learn.   I am planning to split this one large group across
> two OSSes in the next month.   Hopefully the kernel panics and
> watchdog errors will go away with the disk OST load shared across two
> OSS machines.
If you have a need for large capacity, but not necessarily peak throughput,
you could shrink the journals on these filesystems (which themselves
consume about 4.5GB of RAM).  It is likely you can''t utilize the full
bandwidth of these disks anyways, unless you have a lot of network
bandwidth into this node.

	umount /dev/sdX
	e2fsck /dev/sdX
	tune2fs -O ^has_journal /dev/sdX
	tune2fs -j -J size=128 /dev/sdX
	mount /dev/sdX

or, when creating the filesystem:

        mkfs.lustre --mountfsoptions="-J size=128"


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Aug 2009 - Limits on OSTs per OSS?

[Lustre-discuss] Limits on OSTs per OSS?

[Lustre-discuss] Limits on OSTs per OSS?

[Lustre-discuss] Limits on OSTs per OSS?

[Lustre-discuss] Limits on OSTs per OSS?