Dear Friends My understanding of this parameter is: It provides the mkfs.lustre command with a hint about the expected level of striping. The larger the (expected) stripe, the larger the resulting inodes. This allows (more) EAs to be embedded in the inode, allowing one read to bring the inode including EAs into memory. Small inodes have the EAs located at the other end of a pointer, casing two seeks to bring the EAs into memory. So much for the theory. With Lustre 1.8.1 the output from the mkfs.lustre with no stripe-count-hint and the output with stripe-count-hint =160 is the same. The inode size in both cases appears to be 512 bytes (viz -I 512). This result is also true in 1.8.1.1. Is stripe-count-hint ignored, as appears to be the case? It is possible to use --mkfsoptions="-I 4096 -i 4608" to force the creation of 4k inodes. (This is the largest permissible size). Please would someone confirm (or deny) the larger inode size has the effect of allowing lustre to handle inodes for heavily striped files more efficiently? Regards Geoff carrier -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091116/904a6aef/attachment.html
On Mon, 2009-11-16 at 21:03 +0000, Geoff Lustre wrote:> > My understanding of this parameter is: > > > It provides the mkfs.lustre command with a hint about the expected > level of striping.Yes, indeed.> The larger the (expected) stripe, the larger the resulting inodes.To a limit, yes. See bug 7240/7241.> This allows (more) EAs to be embedded in the inode, allowing one read > to bring the inode including EAs into memory.Yes, that''s the theory.> Small inodes have the EAs located at the other end of a pointer, > casing two seeks to bring the EAs into memory.Correct.> So much for the theory. With Lustre 1.8.1 the output from the > mkfs.lustre with no stripe-count-hint and the output with > stripe-count-hint =160 is the same.That''s right. Because 160 is at the other end of the scale.> Is stripe-count-hint ignored, as appears to be the case?No. The algorithm is as such: if the stripe-count hint > 72, then 512 byte inode if the stripe-count hint > 32, then 2048 byte inode if the stripe-count hint > 10, then 1024 byte inode otherwise, 1024 byte inode Where the last match wins.> It is possible to use --mkfsoptions="-I 4096 -i 4608" to force the > creation of 4k inodes.Yes.> (This is the largest permissible size).Correct.> Please would someone confirm (or deny) the larger inode size has the > effect of allowing lustre to handle inodes for heavily striped files > more efficiently?It does, with a cost as outlined by bug 7240/7241. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091116/e5fdc829/attachment.bin
On Mon, 2009-11-16 at 16:26 -0500, Brian J. Murrell wrote: A correction to my posting of yesterday...> No. The algorithm is as such: > > if the stripe-count hint > 72, then 512 byte inode > if the stripe-count hint > 32, then 2048 byte inode > if the stripe-count hint > 10, then 1024 byte inode > otherwise, 1024 byte inode^^^^ This should be 512 byte inode. Apologies for the cut''n''pasto. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091117/94bda0b1/attachment-0001.bin
Dear Brian Thanks for the correction. Your point was made clearly enough though! One more question which I feel will be of interest to more than just me. Any reason the inode maxes out at 2k? 4k is perfectly possible. Regards Geoff On Tue, Nov 17, 2009 at 2:50 PM, Brian J. Murrell <Brian.Murrell at sun.com>wrote:> On Mon, 2009-11-16 at 16:26 -0500, Brian J. Murrell wrote: > > A correction to my posting of yesterday... > > > No. The algorithm is as such: > > > > if the stripe-count hint > 72, then 512 byte inode > > if the stripe-count hint > 32, then 2048 byte inode > > if the stripe-count hint > 10, then 1024 byte inode > > otherwise, 1024 byte inode > ^^^^ > This should be 512 byte inode. > > Apologies for the cut''n''pasto. > > b. > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091117/27d25860/attachment.html
On 2009-11-17, at 09:13, Geoff Lustre wrote:> Thanks for the correction. Your point was made clearly enough > though! One more question which I feel will be of interest to more > than just me. > > Any reason the inode maxes out at 2k? 4k is perfectly possible.One problem that used to be hit with 4k inodes is that the block size is also 4k, and the default filesystem configuration is to have one inode for each block in the filesystem. That means that the inode table would consume all of the blocks in the filesystem, leaving no space for other (meta)data like the journal, directories, etc. This used to cause mke2fs to spin forever trying to find free space in the filesystem. Secondly, it is very uncommon to have a filesystem with a default stripe size of 160, so this is generally not hit. Finally, even if the inode size is 4kB, this leaves 4096-128 bytes of space for extended attributes, which isn''t enough to hold a fully striped file (160 stripes) and there will not be much/any space left in the filesystem to store external xattrs. Instead, it is better to create a smaller inode, and for the lower percentage of files that have a lot of stripes they will use external xattr blocks to store the striping data.> On Tue, Nov 17, 2009 at 2:50 PM, Brian J. Murrell <Brian.Murrell at sun.com > > wrote: > On Mon, 2009-11-16 at 16:26 -0500, Brian J. Murrell wrote: > > A correction to my posting of yesterday... > > > No. The algorithm is as such: > > > > if the stripe-count hint > 72, then 512 byte inode > > if the stripe-count hint > 32, then 2048 byte inode > > if the stripe-count hint > 10, then 1024 byte inode > > otherwise, 1024 byte inode > ^^^^ > This should be 512 byte inode. > > Apologies for the cut''n''pasto. > > b. > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.