What is the disadvantage of creating a MDS partition with smaller inodes per block? Now, its 4k per inode what happens if we go to the least blocks which is 1024k? This would let us create more smaller files which will lead to more inodes used. But what is the downside? TIA
On Thu, 2008-08-14 at 08:05 -0400, Mag Gam wrote:> What is the disadvantage of creating a MDS partition with smaller > inodes per block? Now, its 4k per inode what happens if we go to the > least blocks which is 1024k?You risk running out of room in the inode for EAs, requiring that another block be allocated to hold the additional EAs and it be linked to the inode. As you can imagine having to seek the disk to move from the inode to the additional EA block has a performance penalty associated with it.> This would let us create more smaller > files which will lead to more inodes used. But what is the downside?Do you really have a use case where 4K inodes doesn''t give you enough files? b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080820/d6cf8acc/attachment.bin
Brian: What is an EA? Yes. We create 20k files per day. Multiple that by 360days per year (for our research), thats about 72000000 files per year We have 11 years of data. 792000000 files OUr file size range from 5M to 70M (average) I know its crazy but a professor or studeny will need any of these year datasets at random. So far lustre has been awesome, just the inode issue. TIA On Wed, Aug 20, 2008 at 5:12 PM, Brian J. Murrell <Brian.Murrell at sun.com> wrote:> On Thu, 2008-08-14 at 08:05 -0400, Mag Gam wrote: >> What is the disadvantage of creating a MDS partition with smaller >> inodes per block? Now, its 4k per inode what happens if we go to the >> least blocks which is 1024k? > > You risk running out of room in the inode for EAs, requiring that > another block be allocated to hold the additional EAs and it be linked > to the inode. As you can imagine having to seek the disk to move from > the inode to the additional EA block has a performance penalty > associated with it. > >> This would let us create more smaller >> files which will lead to more inodes used. But what is the downside? > > Do you really have a use case where 4K inodes doesn''t give you enough > files? > > b. > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >
On Aug 20, 2008 22:14 -0400, Mag Gam wrote:> What is an EA?EA = extended attribute. This is how Lustre stores striping information (location of file data on OSTs).> Yes. We create 20k files per day. Multiple that by 360days per year > (for our research), thats about 72000000 files per year > > We have 11 years of data. 792000000 filesThere are several Lustre filesystems with this many inodes on the MDS. The only supported option for the blocksize is 4096 bytes/block. It is strongly recommended to have 512-byte inodes. For the amount of filesystem space per inode the default is 4096 bytes per inode, but it is possible to allocate less space than this, especially if you know that you will not have many stripes per file. Specifying "-i 2048" is not unreasonable (2048 bytes/inode). This means 792M inodes * 2048 ~= 1.6TB for the MDS. Not at all unusual.> OUr file size range from 5M to 70M (average)The average file size really only reflects the ratio between the MDS and OST filesystem space. This means for 792M files you need about between 4TB and 56TB of OST storage, probably 1 - 14 OSTs at 4TB each. This is again not at all unusual.> I know its crazy but a professor or studeny will need any of these > year datasets at random. So far lustre has been awesome, just the > inode issue. > > > TIA > > > > > > > On Wed, Aug 20, 2008 at 5:12 PM, Brian J. Murrell <Brian.Murrell at sun.com> wrote: > > On Thu, 2008-08-14 at 08:05 -0400, Mag Gam wrote: > >> What is the disadvantage of creating a MDS partition with smaller > >> inodes per block? Now, its 4k per inode what happens if we go to the > >> least blocks which is 1024k? > > > > You risk running out of room in the inode for EAs, requiring that > > another block be allocated to hold the additional EAs and it be linked > > to the inode. As you can imagine having to seek the disk to move from > > the inode to the additional EA block has a performance penalty > > associated with it. > > > >> This would let us create more smaller > >> files which will lead to more inodes used. But what is the downside? > > > > Do you really have a use case where 4K inodes doesn''t give you enough > > files? > > > > b. > > > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
I suppose I can check the current settings by using tune2fs -l? But by reading your post about "lfs -i" discrepancy, I am a little scared to digg this far into it. That post is was a wake up call on Lustre allocates and shows inode usage :-) Thanks again On Thu, Aug 21, 2008 at 6:58 AM, Andreas Dilger <adilger at sun.com> wrote:> On Aug 20, 2008 22:14 -0400, Mag Gam wrote: >> What is an EA? > > EA = extended attribute. This is how Lustre stores striping information > (location of file data on OSTs). > >> Yes. We create 20k files per day. Multiple that by 360days per year >> (for our research), thats about 72000000 files per year >> >> We have 11 years of data. 792000000 files > > There are several Lustre filesystems with this many inodes on the MDS. > The only supported option for the blocksize is 4096 bytes/block. It > is strongly recommended to have 512-byte inodes. For the amount of > filesystem space per inode the default is 4096 bytes per inode, but > it is possible to allocate less space than this, especially if you > know that you will not have many stripes per file. > > Specifying "-i 2048" is not unreasonable (2048 bytes/inode). This > means 792M inodes * 2048 ~= 1.6TB for the MDS. Not at all unusual. > >> OUr file size range from 5M to 70M (average) > > The average file size really only reflects the ratio between the MDS > and OST filesystem space. This means for 792M files you need about > between 4TB and 56TB of OST storage, probably 1 - 14 OSTs at 4TB each. > This is again not at all unusual. > >> I know its crazy but a professor or studeny will need any of these >> year datasets at random. So far lustre has been awesome, just the >> inode issue. >> >> >> TIA >> >> >> >> >> >> >> On Wed, Aug 20, 2008 at 5:12 PM, Brian J. Murrell <Brian.Murrell at sun.com> wrote: >> > On Thu, 2008-08-14 at 08:05 -0400, Mag Gam wrote: >> >> What is the disadvantage of creating a MDS partition with smaller >> >> inodes per block? Now, its 4k per inode what happens if we go to the >> >> least blocks which is 1024k? >> > >> > You risk running out of room in the inode for EAs, requiring that >> > another block be allocated to hold the additional EAs and it be linked >> > to the inode. As you can imagine having to seek the disk to move from >> > the inode to the additional EA block has a performance penalty >> > associated with it. >> > >> >> This would let us create more smaller >> >> files which will lead to more inodes used. But what is the downside? >> > >> > Do you really have a use case where 4K inodes doesn''t give you enough >> > files? >> > >> > b. >> > >> > >> > _______________________________________________ >> > Lustre-discuss mailing list >> > Lustre-discuss at lists.lustre.org >> > http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > >> > >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > >