In Lustre 2.x, what is the largest number of files that we could possibly have? I noticed that mkfs.lustre on the MDT passes the following parameters to mkfs.ext2: -i 4096 -I 512 Can these params be smaller? Can we get more inodes if we use zfs? Thanks. Roger Spellman Staff Engineer Terascala, Inc. 508-588-1501 www.terascala.com <http://www.terascala.com/> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101018/be2d8b12/attachment.html
On 2010-10-18, at 14:47, Roger Spellman wrote:> In Lustre 2.x, what is the largest number of files that we could possibly have? > > I noticed that mkfs.lustre on the MDT passes the following parameters to mkfs.ext2: -i 4096 -I 512Do you mean the maximum OST size (as mentioned in the subject) or the maximum MDT size (above)? For the ext4-based ldiskfs the maximum size is 16TB and 4B inodes (this listed in the manual).> Can these params be smaller?For the MDT, yes, you could potentially use "-i 1500" as about the minimum space per inode, but then you risk running out of space in the filesystem before running out of inodes. The "-I 512" parameter controls the size of the inode itself, which holds the xattrs. If there are single-striped files and no use of ACLs, user_xattrs, etc. then you might get by with "-I 256", but if this xattr space is exceeded then each such inode will consume 4096 bytes of space and also be slower to access.> Can we get more inodes if we use zfs?Definitely yes. Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
Sorry, I meant max MDT size (I changed the subject). I guess that my questions are: 1. What is the maximum number of files I can get on an MDT with ldiskfs? 2. What parameters need to be modified to achieve this? 3. What is the maximum number of files I can get on an MDT with ZFS? 4. What parameters need to be modified to achieve this?> -----Original Message----- > From: Andreas Dilger [mailto:andreas.dilger at oracle.com] > Sent: Monday, October 18, 2010 5:00 PM > To: Roger Spellman > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] Maximum OST Size > > On 2010-10-18, at 14:47, Roger Spellman wrote: > > In Lustre 2.x, what is the largest number of files that we could > possibly have? > > > > I noticed that mkfs.lustre on the MDT passes the followingparameters to> mkfs.ext2: -i 4096 -I 512 > > Do you mean the maximum OST size (as mentioned in the subject) or the > maximum MDT size (above)? For the ext4-based ldiskfs the maximum sizeis> 16TB and 4B inodes (this listed in the manual). > > > Can these params be smaller? > > For the MDT, yes, you could potentially use "-i 1500" as about theminimum> space per inode, but then you risk running out of space in thefilesystem> before running out of inodes. The "-I 512" parameter controls thesize of> the inode itself, which holds the xattrs. If there are single-striped > files and no use of ACLs, user_xattrs, etc. then you might get by with"-I> 256", but if this xattr space is exceeded then each such inode will > consume 4096 bytes of space and also be slower to access. > > > Can we get more inodes if we use zfs? > > Definitely yes. > > Cheers, Andreas > -- > Andreas Dilger > Lustre Technical Lead > Oracle Corporation Canada Inc.
On 2010-10-19, at 08:27, Roger Spellman wrote:> I don''t understand this comment: >> For the MDT, yes, you could potentially use "-i 1500" as about the >> minimum space per inode, but then you risk running out of space in the >> filesystem before running out of inodes. > > If we set -I to 512, then on an MDT, what else is there that would cause > require 1500 bytes per inode?With "-I 512" that means the actual inode will consume 512 bytes, so with "-i 1536" there would be 1024 bytes per inode of block space still available. That extra space is needed for everything else in the filesystem, including the journal, directory blocks, Lustre metadata (last_rcvd, distributed transaction logs, etc), and any external xattr blocks for widely-striped files (beyond 12 stripes or so).> Just ACLs and striping? If there are no ACLs, and all files are single-striped, then could both -i and -I be set to the same value, say 512?No, this will cause mke2fs to fail. There needs to be some free space in the filesystem for the above filesysem/Lustre metadata. In any case, since the maximum number of inodes is 2^32 the total filesystem size is not the limiting factor.> Andreas Dilger wrote: >> On 2010-10-18, at 14:47, Roger Spellman wrote: >>> In Lustre 2.x, what is the largest number of files that we could >>> possibly have? >>> >>> I noticed that mkfs.lustre on the MDT passes the following >>> parameters to mkfs.ext2: -i 4096 -I 512 >> >> Do you mean the maximum OST size (as mentioned in the subject) or the >> maximum MDT size (above)? For the ext4-based ldiskfs the maximum size >> is 16TB and 4B inodes (this listed in the manual). >> >>> Can these params be smaller? >> >> For the MDT, yes, you could potentially use "-i 1500" as about the >> minimum space per inode, but then you risk running out of space in the >> filesystem before running out of inodes. The "-I 512" parameter controls >> the size of the inode itself, which holds the xattrs. If there are >> single-striped files and no use of ACLs, user_xattrs, etc. then you might >> get by with "-I 256", but if this xattr space is exceeded then each such >> inode will consume 4096 bytes of space and also be slower to access. >> >>> Can we get more inodes if we use zfs? >> >> Definitely yes. >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Lustre Technical Lead >> Oracle Corporation Canada Inc. >Cheers, Andreas -- Andreas Dilger Lustre Technical Lead Oracle Corporation Canada Inc.
On Wednesday, October 20, 2010, Andreas Dilger wrote:> On 2010-10-19, at 08:27, Roger Spellman wrote: > > I don''t understand this comment: > >> For the MDT, yes, you could potentially use "-i 1500" as about the > >> minimum space per inode, but then you risk running out of space in the > >> filesystem before running out of inodes. > > > > If we set -I to 512, then on an MDT, what else is there that would cause > > require 1500 bytes per inode? > > With "-I 512" that means the actual inode will consume 512 bytes, so with > "-i 1536" there would be 1024 bytes per inode of block space still > available. That extra space is needed for everything else in the > filesystem, including the journal, directory blocks, Lustre metadata > (last_rcvd, distributed transaction logs, etc), and any external xattr > blocks for widely-striped files (beyond 12 stripes or so). >I have to admit, I entirely fail to understand why we should need 2/3 of the filesystem reserved for real file data. - journal - 400MB -> negligible with recent decent MDT sizes (1TiB+) - directory blocks, maybe, but I have noticed any system where that takes more than 5% - Lustre metadata > (last_rcvd, distributed transaction logs, etc) -> negligible with recent decent MDT sizes - external xattr for Lustre lov and additional ACLs: Maybe, depends on the customer With the default -i 4096, it looks like that for most customers I know of: df -h: 973G 57G 861G 7% /lustre/lustre/mdt df -ih: 278M 248M 31M 89% /lustre/lustre/mdt So doubling inode ratio to -i2048 or even quadrupling it to -i1024 seems to be recommendable. Cheers, Bernd