Piotr Wadas
2009-Oct-18 22:04 UTC
[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size
Hello, Proudly report working server & patched-client 1.8.1 + 2.6.27.23 + drbd 8.3.4 on Debian GNU/Linux (x86) sid/experimental Test install made with two vmware-based virtual machines, and base system (also debian gnu linux) as lustre patched client. Note the following: * First I tried with really small partitions, just a few MB, and this was impossible to mkfs.lustre, because "file system too small for a journal", quite reasonably, though. * Test install on virtualbox did not succeed because host-only network bugs/limitation on virtualbox * Confirmed one can use LVM''s PVs or LVs as lustre block devices. * Confirmed working with MGS/MDT/OSTs (actually two OSTs for now) AND client on very same (fully-virtual) machine, for testing purposes, no problems with that so far. Now, I did a simple count of MDT size as described in lustre 1.8.1 manual, and setup mdt as recommended. The question is, no matter I did right count or not, what actually will happen, if MDT partition runs out of space? Any chances to dump the whole MGS+MDT combined fs, supply a bigger block device, or extend partition size with some e2fsprogs/tune2fs trick ? This assumes, that no matter how big MDT is, it will be exhausted someday. One possible solution is simply to add/create another FS, with another MGS/MDT. But the question persists :) And one more thing - I use combined MGS/MDT. What''s actually about MGS size? I mean, if I use separate MGS and MDT, what size it should have, and how management service works, regarding to its block-device storage ? Regards, Piotr Wadas
Andreas Dilger
2009-Oct-20 16:15 UTC
[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size
On 18-Oct-09, at 16:04, Piotr Wadas wrote:> Now, I did a simple count of MDT size as described in lustre 1.8.1 > manual, > and setup mdt as recommended. The question is, no matter I did right > count > or not, what actually will happen, if MDT partition runs out of space? > Any chances to dump the whole MGS+MDT combined fs, supply a bigger > block > device, or extend partition size with some e2fsprogs/tune2fs trick ? > This assumes, that no matter how big MDT is, it will be exhausted > someday.It is true that the MDT device can become full at some point, but this happens fairly rarely given that most Lustre HPC users have very large files, and the size of the MDT is MUCH smaller than the space needed for the file data. The maximum size of MDT is 8TB, and if you format the filesystem with "-i 2048" you can get 4B inodes therein, which is the maximum. Even the largest filesystem we have seen doesn''t use that many inodes. Once ZFS backing filesystems are available, this fixed inode limit will be gone (for all practical purposes). It will allow up to 2^48 files per fileset, and it with CMD (when it finally arrives) will allow multiple MDTs in a single Lustre filesystem.> One possible solution is simply to add/create another FS, with another > MGS/MDT. But the question persists :)If you are using LVM you can increase the size of the MDT device and resize the filesystem to add more inodes to the filesystem.> And one more thing - I use combined MGS/MDT. What''s actually about MGS > size? I mean, if I use separate MGS and MDT, what size it should have, > and how management service works, regarding to its block-device > storage ?The MGS needs only some MB of space, maybe 128MB is the most it would ever need. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Brian J. Murrell
2009-Oct-20 16:33 UTC
[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size
On Tue, 2009-10-20 at 10:15 -0600, Andreas Dilger wrote:> > It is true that the MDT device can become full at some point, but this > happens fairly rarely given that most Lustre HPC users have very large > files, and the size of the MDT is MUCH smaller than the space needed for > the file data.Indeed. For some (very) anecdotal experience, witness my own very small Lustre filesystem usage: $ lfs df UUID 1K-blocks Used Available Use% Mounted on mds1_UUID 18348668 1327240 15972852 7% /mnt/lustre[MDT:0] client-OST0000_UUID 20642428 14883944 4709844 72% /mnt/lustre[OST:0] client-OST0001_UUID 20642428 14908260 4685528 72% /mnt/lustre[OST:1] client-OST0002_UUID 20642428 15055492 4538296 72% /mnt/lustre[OST:2] client-OST0003_UUID 20642428 14905716 4688072 72% /mnt/lustre[OST:3] client-OST0004_UUID 20642428 14871520 4722268 72% /mnt/lustre[OST:4] filesystem summary: 103212140 74624932 23344008 72% /mnt/lustre $ lfs df -i UUID Inodes IUsed IFree IUse% Mounted on mds1_UUID 5242880 2109580 3133300 40% /mnt/lustre[MDT:0] client-OST0000_UUID 1310720 208666 1102054 15% /mnt/lustre[OST:0] client-OST0001_UUID 1310720 538201 772519 41% /mnt/lustre[OST:1] client-OST0002_UUID 1310720 388754 921966 29% /mnt/lustre[OST:2] client-OST0003_UUID 1310720 292766 1017954 22% /mnt/lustre[OST:3] client-OST0004_UUID 1310720 469037 841683 35% /mnt/lustre[OST:4] filesystem summary: 5242880 2109580 3133300 40% /mnt/lustre As you can see, my MDT, at just less than 20% of the size of my total OST storage is quite oversized (by 10x perhaps?) for the data I am storing, and I store lots of small files -- Lustre, kernel and other misc source trees. I don''t do any striping however, which helps keep MDT usage lower.> If you are using LVM you can increase the size of the MDT device and > resize the filesystem to add more inodes to the filesystem.Ahhh. I don''t think I knew that resizing actually increased inode counts. Just for the experience of it, (and when I can find a moment to do it) I will probably transplant my MDT into a newly created (on LVM of course, given I do everything on LVM) device, much smaller than my current one. Indeed, I could try just shrinking the existing one, but I want to create a new one from scratch, complete with a mountconf- style UUID and move the MDT data into it. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091020/e89c9554/attachment.bin
Nirmal Seenu
2009-Oct-21 18:06 UTC
[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size
Could you please let us know the correct procedure to grow an MDT partition that is on a LVM volume. Do I use resize2fs, after I add more extents to the MDT volume using the "lvextend" command. I am using Lustre 1.8.0.1 + e2fsprogs-1.40.11.sun1-0redhat.x86_64 and the resize2fs version is: 1.40.11.sun1 (17-June-2008) Thanks Nirmal
Brian J. Murrell
2009-Oct-21 18:14 UTC
[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size
On Wed, 2009-10-21 at 13:06 -0500, Nirmal Seenu wrote:> Could you please let us know the correct procedure to grow an MDT > partition that is on a LVM volume. > > Do I use resize2fs, after I add more extents to the MDT volume using the > "lvextend" command.Yes, that''s the theory.> I am using Lustre 1.8.0.1 + e2fsprogs-1.40.11.sun1-0redhat.x86_64 and > the resize2fs version is: 1.40.11.sun1 (17-June-2008)Yes, best be sure you are using the latest e2fsprogs we have released. Of course, I would be remiss to point out that we DO NOT test this feature _at_all_ and that you should have a good, tested backup on hand, just in case. Personally, I would also (take the performance hit) and make a snapshot just to add a belt to my suspenders. I would remove the snapshot when I was happy enough with the result. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091021/0f721d01/attachment.bin
Bernd Schubert
2009-Oct-23 09:51 UTC
[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size
On Tuesday 20 October 2009, Andreas Dilger wrote:> On 18-Oct-09, at 16:04, Piotr Wadas wrote: > > Now, I did a simple count of MDT size as described in lustre 1.8.1 > > manual, > > and setup mdt as recommended. The question is, no matter I did right > > count > > or not, what actually will happen, if MDT partition runs out of space? > > Any chances to dump the whole MGS+MDT combined fs, supply a bigger > > block > > device, or extend partition size with some e2fsprogs/tune2fs trick ? > > This assumes, that no matter how big MDT is, it will be exhausted > > someday. > > It is true that the MDT device can become full at some point, but this > happens fairly rarely given that most Lustre HPC users have very large > files, and the size of the MDT is MUCH smaller than the space needed for > the file data. The maximum size of MDT is 8TB, and if you format theIs that still true with recent kernels such as the one from SLES11? I thought ldiskfs is based on ext4 there? So we should have at least 16TiB and I''m not sure if all the e2fsprogs patches already have been landed to get 64-bit max sizes? Thanks, Bernd -- Bernd Schubert DataDirect Networks
Andreas Dilger
2009-Oct-23 10:57 UTC
[Lustre-discuss] 1.8.1 test setup achieved, what about maximum mdt size
On 2009-10-23, at 03:51, Bernd Schubert wrote:> On Tuesday 20 October 2009, Andreas Dilger wrote: >> On 18-Oct-09, at 16:04, Piotr Wadas wrote: >>> Now, I did a simple count of MDT size as described in lustre 1.8.1 >>> manual, >>> and setup mdt as recommended. The question is, no matter I did right >>> count >>> or not, what actually will happen, if MDT partition runs out of >>> space? >>> Any chances to dump the whole MGS+MDT combined fs, supply a bigger >>> block >>> device, or extend partition size with some e2fsprogs/tune2fs trick ? >>> This assumes, that no matter how big MDT is, it will be exhausted >>> someday. >> >> It is true that the MDT device can become full at some point, but >> this >> happens fairly rarely given that most Lustre HPC users have very >> large >> files, and the size of the MDT is MUCH smaller than the space >> needed for >> the file data. The maximum size of MDT is 8TB, and if you format the > > Is that still true with recent kernels such as the one from SLES11? > I thought > ldiskfs is based on ext4 there? So we should have at least 16TiB and > I''m not > sure if all the e2fsprogs patches already have been landed to get 64- > bit max > sizes?16TB LUN support is still under testing, so it isn''t officially supported yet. The upstream e2fsprogs don''t have 64-bit support finished yet (also under testing) and when that is done there will need to be additional testing with Lustre. There is some question of whether SLES11 will get all of the fixes needed for > 16TB support, or if it is better to get that from RHEL6 instead. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Piotr Wadas
2009-Nov-07 01:01 UTC
[Lustre-discuss] 1.8.1.1 too :) # Re: 1.8.1 test setup achieved, what about maximum mdt size
[..]> As you can see, my MDT, at just less than 20% of the size of my total > OST storage is quite oversized (by 10x perhaps?) for the data I am > storing, and I store lots of small files -- Lustre, kernel and other > misc source trees. I don''t do any striping however, which helps keep > MDT usage lower. > > > If you are using LVM you can increase the size of the MDT device and > > resize the filesystem to add more inodes to the filesystem. > > Ahhh. I don''t think I knew that resizing actually increased inode > counts. > > Just for the experience of it, (and when I can find a moment to do it) I > will probably transplant my MDT into a newly created (on LVM of course, > given I do everything on LVM) device, much smaller than my current one. > Indeed, I could try just shrinking the existing one, but I want to > create a new one from scratch, complete with a mountconf- style UUID and > move the MDT data into it.Well, I''m happy to report test setup "upgraded" to 1.8.1.1 ;-) Now, regarding this topic - I experiment with lustre quota features. How MDT usage reflects to storing of quota information? Seems, it''s still based on quota v1/v2 files somehow, although as described in chapter 9.1 Lustre Manual, usrquota and grpquota mount options are obsolete on fs client. I do not expect surprises here, I mean size of storage of quota info will be probably not worth to mention, anyway this is interesting how it''s actually stored ? Is it some classic quota combined with live per-ost summary? Regards, DT