We typically debate weather to create a large OST or smaller OSTs (100GB). We prefer large OSTs (1 TB) because the ease of management but how would fsck work? Would it take a long time? Also, if a large OST is preferred is it possible to consolidate a smaller OSTs into a larger one? TIA
Hi Mag, fsck''ing a Lustre volume doesn''t take any more or less time than fsck''ing a traditional ext2/ext3 volume. I''ve had to run fsck a few times over the years on 2 TB volumes on a DDN SAN, and depending on how much needs correcting, it usually takes about 15-20 minutes from start to finish. The only real constraint with OST size is the Linux max file system size (2 TB if memory serves). I don''t know if there''s a performance benefit or penalty if you have multiple, smaller OSTs ... likely Andreas will be able to shed some light. hth, Klaus -----Original Message----- From: lustre-discuss-bounces at lists.lustre.org on behalf of Mag Gam Sent: Sun 10/19/2008 6:29 AM To: Lustre discuss Subject: [Lustre-discuss] large ost and fsck We typically debate weather to create a large OST or smaller OSTs (100GB). We prefer large OSTs (1 TB) because the ease of management but how would fsck work? Would it take a long time? Also, if a large OST is preferred is it possible to consolidate a smaller OSTs into a larger one? TIA _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
On Oct 20, 2008 13:16 -0700, Steden Klaus wrote:> fsck''ing a Lustre volume doesn''t take any more or less time than > fsck''ing a traditional ext2/ext3 volume.That isn''t quite true. Lustre uses extents in ext3 and also the "uninit_groups" feature (both merged into the upstream kernel in ext4) which can significantly reduce e2fsck times because much less metadata is read from the disk (which can be slow and seeky).> I''ve had to run fsck a few times over the years on 2 TB volumes on a > DDN SAN, and depending on how much needs correcting, it usually takes > about 15-20 minutes from start to finish.In the past it used to take about 1h to run e2fsck for 1TB, but this can be down as low as 5 minutes with Lustre filesystems today, especially if the uninit_groups feature is enabled.> The only real constraint with OST size is the Linux max file system size > (2 TB if memory serves). I don''t know if there''s a performance benefit > or penalty if you have multiple, smaller OSTs ... likely Andreas will > be able to shed some light.The current maximum OST size is 8TB. We are testing with 16TB with RHEL5 kernels, though testing isn''t finished yet, and also working to back-port fixes to SLES10 to also allow 16TB OSTs.> -----Original Message----- > From: lustre-discuss-bounces at lists.lustre.org on behalf of Mag Gam > Sent: Sun 10/19/2008 6:29 AM > To: Lustre discuss > Subject: [Lustre-discuss] large ost and fsck > > We typically debate weather to create a large OST or smaller OSTs > (100GB). We prefer large OSTs (1 TB) because the ease of management > but how would fsck work? Would it take a long time? Also, if a large > OST is preferred is it possible to consolidate a smaller OSTs into a > larger one?Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Can you be more specific about the 8TB limit? Is it correct that this is an 8TB base 2 (8796GB base 10) limit? Does this limit apply to the raw device size or formatted Lustre OST size? Using RHEL5U1, we have created an OST on a 9.001 TB raw device with a formatted size reported from lustre_config of 8584023MB. Is this OST OK to use, or would we have receive an error message if it was not OK? Thanks, Bob> The current maximum OST size is 8TB. We are testing with 16TB with RHEL5 > kernels, though testing isn''t finished yet, and also working to back-port > fixes to SLES10 to also allow 16TB OSTs. > >
On Tue, 2008-10-21 at 12:00 -0400, Kossey, Robert wrote:> Can you be more specific about the 8TB limit? Is it correct that this > is an 8TB base 2 (8796GB base 10) limit?Probably it would be useful here and in our documentation to observe and be explicit that we are observing the IEC standard for base 2 and base 10 naming. http://www.iec.ch/zone/si/si_bytes.htm This is not to imply that our documentation is currently adhering to one or the other (or consistent) but that clarification and consistency would be a good enhancement. So the question is is the limit 8TiB or 8TB?> Does this limit apply to the raw device size or formatted Lustre OST > size?I think it''s the raw disk size because this limit is a limit imposed by the ext3 filesystem. According to http://en.wikipedia.org/wiki/Ext3#Size_limits the limit is indeed 8TiB and not 8TB.> Using RHEL5U1, we have created an OST on a 9.001 TBTiB or TB? Respectively those amount to 9,896,704,161,611.776 and 9,001,000,000,000 bytes. When those are converted to TiB they are are: 9.001 and 8.18636 TiB (respectively) which are both more than 8TiB.> Is this > OST OK to use, or would we have receive an error message if it was not OK?I''m not sure what the consequences of using an OST > 8TiB are to be honest. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081028/53cd7972/attachment.bin
The original case raw size was 9.001TB(9,001,000,000,000 bytes), but the (ext3?) formatted size reported by lustre_config was 8584023MB, which is less than 8TiB. However, we have tried another case with a raw size of 11.0 TB, for which the formatted size is well above 8TiB: root> lfs df -h UUID bytes Used Available Use% Mounted on testfs-MDT0000_UUID 717.7G 472.6M 676.2G 0% /testfs[MDT:0] testfs-OST0000_UUID 9.8T 436.5M 9.3T 0% /testfs[OST:0] filesystem summary: 9.8T 436.5M 9.3T 0% /testfs So is it the case that > 8TiB is possible to create without errors, but remains unsupported? Bob > >/ Using RHEL5U1, we have created an OST on a 9.001 TB > / > TiB or TB? Respectively those amount to > > 9,896,704,161,611.776 and 9,001,000,000,000 bytes. > > When those are converted to TiB they are are: > > 9.001 and 8.18636 TiB (respectively) which are both more than 8TiB. > > >/ Is this > />/ OST OK to use, or would we have receive an error message > if it was not OK? > / > I''m not sure what the consequences of using an OST > 8TiB are to be > honest. > > b.
On Oct 21, 2008 12:00 -0400, Kossey, Robert wrote:> Can you be more specific about the 8TB limit? Is it correct that this > is an 8TB base 2 (8796GB base 10) limit?All filesystem limits are a result of base-2 issues.> Does this limit apply to the raw device size or formatted Lustre OST > size? Using RHEL5U1, we have created an OST on a 9.001 TB raw device > with a formatted size reported from lustre_config of 8584023MB. Is this > OST OK to use, or would we have receive an error message if it was not OK?For RHEL5 it is possible that a 9TB filesystem will work (I believe a fix for this went into 1.6.6). We just haven''t finished our testing on it yet. There is no chance that a >= 8TiB filesystem would work with SLES10 currently. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
On Oct 28, 2008 11:01 -0400, Kossey, Robert wrote:> The original case raw size was 9.001TB(9,001,000,000,000 bytes), but the > (ext3?) formatted size reported by lustre_config was 8584023MB, which is > less than 8TiB. However, we have tried another case with a raw size of > 11.0 TB, for which the formatted size is well above 8TiB: > > root> lfs df -h > UUID bytes Used Available Use% Mounted on > testfs-MDT0000_UUID 717.7G 472.6M 676.2G 0% /testfs[MDT:0] > testfs-OST0000_UUID 9.8T 436.5M 9.3T 0% /testfs[OST:0] > > filesystem summary: 9.8T 436.5M 9.3T 0% /testfs > > So is it the case that > 8TiB is possible to create without errors, but > remains unsupported?That is correct. This may expose you to data corruption errors. It may not, we aren''t sure yet because we haven''t done testing beyond 8TB. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.