thr3ads.net - Lustre discuss - [Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size [Oct 2009]

If this information is useful, please help other people find it:
Share via:

Piotr Wadas

2009-Oct-18 22:04 UTC

[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size

Hello, 
Proudly report working 

server & patched-client 1.8.1 + 2.6.27.23 + drbd 8.3.4 on 
Debian GNU/Linux (x86) sid/experimental

Test install made with two vmware-based virtual machines, and
base system (also debian gnu linux) as lustre patched client.

Note the following:

* First I tried with really small partitions, just a few MB,
and this was impossible to mkfs.lustre, because "file system too small for 
a journal", quite reasonably, though.

* Test install on virtualbox did not succeed because host-only network
bugs/limitation on virtualbox

* Confirmed one can use LVM''s PVs or LVs as lustre block devices.

* Confirmed working with MGS/MDT/OSTs (actually two OSTs for now) AND 
client on very same (fully-virtual) machine, for testing purposes, no 
problems with that so far.

Now, I did a simple count of MDT size as described in lustre 1.8.1 manual,
and setup mdt as recommended. The question is, no matter I did right count
or not, what actually will happen, if MDT partition runs out of space?
Any chances to dump the whole MGS+MDT combined fs, supply a bigger block 
device, or extend partition size with some e2fsprogs/tune2fs trick ?
This assumes, that no matter how big MDT is, it will be exhausted someday.

One possible solution is simply to add/create another FS, with another 
MGS/MDT. But the question persists :) 

And one more thing - I use combined MGS/MDT. What''s actually about MGS 
size? I mean, if I use separate MGS and MDT, what size it should have,
and how management service works, regarding to its block-device storage ?

Regards,
Piotr Wadas

Andreas Dilger

2009-Oct-20 16:15 UTC

head link

[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size

On 18-Oct-09, at 16:04, Piotr Wadas wrote:> Now, I did a simple count of MDT size as described in lustre 1.8.1  
> manual,
> and setup mdt as recommended. The question is, no matter I did right  
> count
> or not, what actually will happen, if MDT partition runs out of space?
> Any chances to dump the whole MGS+MDT combined fs, supply a bigger  
> block
> device, or extend partition size with some e2fsprogs/tune2fs trick ?
> This assumes, that no matter how big MDT is, it will be exhausted  
> someday.
It is true that the MDT device can become full at some point, but this
happens fairly rarely given that most Lustre HPC users have very large
files, and the size of the MDT is MUCH smaller than the space needed for
the file data.  The maximum size of MDT is 8TB, and if you format the
filesystem with "-i 2048" you can get 4B inodes therein, which is the
maximum.  Even the largest filesystem we have seen doesn''t use that
many
inodes.

Once ZFS backing filesystems are available, this fixed inode limit will
be gone (for all practical purposes).  It will allow up to 2^48 files
per fileset, and it with CMD (when it finally arrives) will allow  
multiple
MDTs in a single Lustre filesystem.
> One possible solution is simply to add/create another FS, with another
> MGS/MDT. But the question persists :)
If you are using LVM you can increase the size of the MDT device and
resize the filesystem to add more inodes to the filesystem.
> And one more thing - I use combined MGS/MDT. What''s actually about
MGS
> size? I mean, if I use separate MGS and MDT, what size it should have,
> and how management service works, regarding to its block-device  
> storage ?

The MGS needs only some MB of space, maybe 128MB is the most it would
ever need.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Brian J. Murrell

2009-Oct-20 16:33 UTC

head link

[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size

On Tue, 2009-10-20 at 10:15 -0600, Andreas Dilger wrote:> 
> It is true that the MDT device can become full at some point, but this
> happens fairly rarely given that most Lustre HPC users have very large
> files, and the size of the MDT is MUCH smaller than the space needed for
> the file data.
Indeed.  For some (very) anecdotal experience, witness my own very small
Lustre filesystem usage:

$ lfs df
UUID                 1K-blocks      Used Available  Use% Mounted on
mds1_UUID             18348668   1327240  15972852    7% /mnt/lustre[MDT:0]
client-OST0000_UUID   20642428  14883944   4709844   72% /mnt/lustre[OST:0]
client-OST0001_UUID   20642428  14908260   4685528   72% /mnt/lustre[OST:1]
client-OST0002_UUID   20642428  15055492   4538296   72% /mnt/lustre[OST:2]
client-OST0003_UUID   20642428  14905716   4688072   72% /mnt/lustre[OST:3]
client-OST0004_UUID   20642428  14871520   4722268   72% /mnt/lustre[OST:4]

filesystem summary:  103212140  74624932  23344008   72% /mnt/lustre

$ lfs df -i
UUID                    Inodes     IUsed     IFree IUse% Mounted on
mds1_UUID              5242880   2109580   3133300   40% /mnt/lustre[MDT:0]
client-OST0000_UUID    1310720    208666   1102054   15% /mnt/lustre[OST:0]
client-OST0001_UUID    1310720    538201    772519   41% /mnt/lustre[OST:1]
client-OST0002_UUID    1310720    388754    921966   29% /mnt/lustre[OST:2]
client-OST0003_UUID    1310720    292766   1017954   22% /mnt/lustre[OST:3]
client-OST0004_UUID    1310720    469037    841683   35% /mnt/lustre[OST:4]

filesystem summary:    5242880   2109580   3133300   40% /mnt/lustre

As you can see, my MDT, at just less than 20% of the size of my total
OST storage is quite oversized (by 10x perhaps?) for the data I am
storing, and I store lots of small files -- Lustre, kernel and other
misc source trees.  I don''t do any striping however, which helps keep
MDT usage lower.
> If you are using LVM you can increase the size of the MDT device and
> resize the filesystem to add more inodes to the filesystem.
Ahhh.  I don''t think I knew that resizing actually increased inode
counts.

Just for the experience of it, (and when I can find a moment to do it) I
will probably transplant my MDT into a newly created (on LVM of course,
given I do everything on LVM) device, much smaller than my current one.
Indeed, I could try just shrinking the existing one, but I want to
create a new one from scratch, complete with a mountconf- style UUID and
move the MDT data into it.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091020/e89c9554/attachment.bin

Nirmal Seenu

2009-Oct-21 18:06 UTC

head link

[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size

Could you please let us know the correct procedure to grow an MDT 
partition that is on a LVM volume.

Do I use resize2fs, after I add more extents to the MDT volume using the 
"lvextend" command.

I am using Lustre 1.8.0.1 + e2fsprogs-1.40.11.sun1-0redhat.x86_64 and 
the resize2fs version is: 1.40.11.sun1 (17-June-2008)

Thanks
Nirmal

Brian J. Murrell

2009-Oct-21 18:14 UTC

head link

[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size

On Wed, 2009-10-21 at 13:06 -0500, Nirmal Seenu wrote:> Could you please let us know the correct procedure to grow an MDT 
> partition that is on a LVM volume.
> 
> Do I use resize2fs, after I add more extents to the MDT volume using the 
> "lvextend" command.
Yes, that''s the theory.
> I am using Lustre 1.8.0.1 + e2fsprogs-1.40.11.sun1-0redhat.x86_64 and 
> the resize2fs version is: 1.40.11.sun1 (17-June-2008)
Yes, best be sure you are using the latest e2fsprogs we have released.

Of course, I would be remiss to point out that we DO NOT test this
feature _at_all_ and that you should have a good, tested backup on hand,
just in case.

Personally, I would also (take the performance hit) and make a snapshot
just to add a belt to my suspenders.  I would remove the snapshot when I
was happy enough with the result.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091021/0f721d01/attachment.bin

Bernd Schubert

2009-Oct-23 09:51 UTC

head link

[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size

On Tuesday 20 October 2009, Andreas Dilger wrote:> On 18-Oct-09, at 16:04, Piotr Wadas wrote:
> > Now, I did a simple count of MDT size as described in lustre 1.8.1
> > manual,
> > and setup mdt as recommended. The question is, no matter I did right
> > count
> > or not, what actually will happen, if MDT partition runs out of space?
> > Any chances to dump the whole MGS+MDT combined fs, supply a bigger
> > block
> > device, or extend partition size with some e2fsprogs/tune2fs trick ?
> > This assumes, that no matter how big MDT is, it will be exhausted
> > someday.
> 
> It is true that the MDT device can become full at some point, but this
> happens fairly rarely given that most Lustre HPC users have very large
> files, and the size of the MDT is MUCH smaller than the space needed for
> the file data.  The maximum size of MDT is 8TB, and if you format the
Is that still true with recent kernels such as the one from SLES11? I thought 
ldiskfs is based on ext4 there? So we should have at least 16TiB and
I''m not
sure if all the e2fsprogs patches already have been landed to get 64-bit max 
sizes?


Thanks,
Bernd

-- 
Bernd Schubert
DataDirect Networks

Andreas Dilger

2009-Oct-23 10:57 UTC

head link

[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size

On 2009-10-23, at 03:51, Bernd Schubert wrote:> On Tuesday 20 October 2009, Andreas Dilger wrote:
>> On 18-Oct-09, at 16:04, Piotr Wadas wrote:
>>> Now, I did a simple count of MDT size as described in lustre 1.8.1
>>> manual,
>>> and setup mdt as recommended. The question is, no matter I did
right
>>> count
>>> or not, what actually will happen, if MDT partition runs out of  
>>> space?
>>> Any chances to dump the whole MGS+MDT combined fs, supply a bigger
>>> block
>>> device, or extend partition size with some e2fsprogs/tune2fs trick
?
>>> This assumes, that no matter how big MDT is, it will be exhausted
>>> someday.
>>
>> It is true that the MDT device can become full at some point, but  
>> this
>> happens fairly rarely given that most Lustre HPC users have very  
>> large
>> files, and the size of the MDT is MUCH smaller than the space  
>> needed for
>> the file data.  The maximum size of MDT is 8TB, and if you format the
>
> Is that still true with recent kernels such as the one from SLES11?  
> I thought
> ldiskfs is based on ext4 there? So we should have at least 16TiB and  
> I''m not
> sure if all the e2fsprogs patches already have been landed to get 64- 
> bit max
> sizes?

16TB LUN support is still under testing, so it isn''t officially  
supported
yet.  The upstream e2fsprogs don''t have 64-bit support finished yet  
(also
under testing) and when that is done there will need to be additional
testing with Lustre.  There is some question of whether SLES11 will get
all of the fixes needed for > 16TB support, or if it is better to get  
that
from RHEL6 instead.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Piotr Wadas

2009-Nov-07 01:01 UTC

head link

[Lustre-discuss] 1.8.1.1 too :) # Re: 1.8.1 test setup achieved, what about maximum mdt size

[..]
> As you can see, my MDT, at just less than 20% of the size of my total
> OST storage is quite oversized (by 10x perhaps?) for the data I am
> storing, and I store lots of small files -- Lustre, kernel and other
> misc source trees.  I don''t do any striping however, which helps
keep
> MDT usage lower.
> 
> > If you are using LVM you can increase the size of the MDT device and
> > resize the filesystem to add more inodes to the filesystem.
> 
> Ahhh.  I don''t think I knew that resizing actually increased inode
> counts.
> 
> Just for the experience of it, (and when I can find a moment to do it) I
> will probably transplant my MDT into a newly created (on LVM of course,
> given I do everything on LVM) device, much smaller than my current one.
> Indeed, I could try just shrinking the existing one, but I want to
> create a new one from scratch, complete with a mountconf- style UUID and
> move the MDT data into it.
Well, I''m happy to report test setup "upgraded" to 1.8.1.1
;-) Now,
regarding this topic - I experiment with lustre quota features. How MDT 
usage reflects to storing of quota information? Seems, it''s still based
on
quota v1/v2 files somehow, although as described in chapter 9.1 Lustre 
Manual, usrquota and grpquota mount options are obsolete on fs client. I 
do not expect surprises here, I mean size of storage of quota info will be 
probably not worth to mention, anyway this is interesting how it''s 
actually stored ? Is it some classic quota combined with live per-ost 
summary?

Regards,
DT

Lustre discuss - Oct 2009 - 1.8.1 test setup achieved, what about maximum mdt size

[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size

[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size

[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size

[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size

[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size

[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size

[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size

[Lustre-discuss] 1.8.1.1 too :) # Re: 1.8.1 test setup achieved, what about maximum mdt size