thr3ads.net - Lustre discuss - [Lustre-discuss] large ost and fsck [Oct 2008]

If this information is useful, please help other people find it:
Share via:

Mag Gam

2008-Oct-19 13:29 UTC

[Lustre-discuss] large ost and fsck

We typically debate weather to create a large OST or smaller OSTs
(100GB). We prefer large OSTs (1 TB) because the ease of management
but how would fsck work? Would it take a long time? Also, if a large
OST is preferred is it possible to consolidate a smaller OSTs into a
larger one?

TIA

Steden Klaus

2008-Oct-20 20:16 UTC

head link

[Lustre-discuss] large ost and fsck

Hi Mag,

fsck''ing a Lustre volume doesn''t take any more or less time
than fsck''ing a traditional ext2/ext3 volume. I''ve had to run
fsck a few times over the years on 2 TB volumes on a DDN SAN, and depending on
how much needs correcting, it usually takes about 15-20 minutes from start to
finish.

The only real constraint with OST size is the Linux max file system size (2 TB
if memory serves). I don''t know if there''s a performance
benefit or penalty if you have multiple, smaller OSTs ... likely Andreas will be
able to shed some light.

hth,
Klaus

-----Original Message-----
From: lustre-discuss-bounces at lists.lustre.org on behalf of Mag Gam
Sent: Sun 10/19/2008 6:29 AM
To: Lustre discuss
Subject: [Lustre-discuss] large ost and fsck
 
We typically debate weather to create a large OST or smaller OSTs
(100GB). We prefer large OSTs (1 TB) because the ease of management
but how would fsck work? Would it take a long time? Also, if a large
OST is preferred is it possible to consolidate a smaller OSTs into a
larger one?

TIA
_______________________________________________
Lustre-discuss mailing list
Lustre-discuss at lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

2008-Oct-21 07:43 UTC

head link

[Lustre-discuss] large ost and fsck

On Oct 20, 2008  13:16 -0700, Steden Klaus wrote:> fsck''ing a Lustre volume doesn''t take any more or less
time than
> fsck''ing a traditional ext2/ext3 volume.
That isn''t quite true.  Lustre uses extents in ext3 and also the
"uninit_groups" feature (both merged into the upstream kernel in ext4)
which can significantly reduce e2fsck times because much less metadata
is read from the disk (which can be slow and seeky).
> I''ve had to run fsck a few times over the years on 2 TB volumes on
a
> DDN SAN, and depending on how much needs correcting, it usually takes
> about 15-20 minutes from start to finish.
In the past it used to take about 1h to run e2fsck for 1TB, but this
can be down as low as 5 minutes with Lustre filesystems today, especially
if the uninit_groups feature is enabled.
> The only real constraint with OST size is the Linux max file system size
> (2 TB if memory serves). I don''t know if there''s a
performance benefit
> or penalty if you have multiple, smaller OSTs ... likely Andreas will
> be able to shed some light.
The current maximum OST size is 8TB.  We are testing with 16TB with RHEL5
kernels, though testing isn''t finished yet, and also working to
back-port
fixes to SLES10 to also allow 16TB OSTs.
> -----Original Message-----
> From: lustre-discuss-bounces at lists.lustre.org on behalf of Mag Gam
> Sent: Sun 10/19/2008 6:29 AM
> To: Lustre discuss
> Subject: [Lustre-discuss] large ost and fsck
>  
> We typically debate weather to create a large OST or smaller OSTs
> (100GB). We prefer large OSTs (1 TB) because the ease of management
> but how would fsck work? Would it take a long time? Also, if a large
> OST is preferred is it possible to consolidate a smaller OSTs into a
> larger one?
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Kossey, Robert

2008-Oct-21 16:00 UTC

head link

[Lustre-discuss] large ost and fsck

Can you be more specific about the 8TB limit?  Is it correct that this 
is an 8TB base 2 (8796GB base 10) limit?
Does this limit apply to the raw device size or formatted Lustre OST 
size?  Using RHEL5U1, we have created an OST on a 9.001 TB raw device 
with a formatted size reported from lustre_config of 8584023MB.  Is this 
OST OK to use, or would we have receive an error message if it was not OK?

Thanks,
Bob> The current maximum OST size is 8TB.  We are testing with 16TB with RHEL5
> kernels, though testing isn''t finished yet, and also working to
back-port
> fixes to SLES10 to also allow 16TB OSTs.
>
>

Brian J. Murrell

2008-Oct-28 12:16 UTC

head link

[Lustre-discuss] large ost and fsck

On Tue, 2008-10-21 at 12:00 -0400, Kossey, Robert wrote:> Can you be more specific about the 8TB limit?  Is it correct that this 
> is an 8TB base 2 (8796GB base 10) limit?
Probably it would be useful here and in our documentation to observe and
be explicit that we are observing the IEC standard for base 2 and base
10 naming.  http://www.iec.ch/zone/si/si_bytes.htm

This is not to imply that our documentation is currently adhering to one
or the other (or consistent) but that clarification and consistency
would be a good enhancement.

So the question is is the limit 8TiB or 8TB?
> Does this limit apply to the raw device size or formatted Lustre OST 
> size?
I think it''s the raw disk size because this limit is a limit imposed by
the ext3 filesystem.  According to
http://en.wikipedia.org/wiki/Ext3#Size_limits the limit is indeed 8TiB
and not 8TB.
> Using RHEL5U1, we have created an OST on a 9.001 TB
TiB or TB?  Respectively those amount to

9,896,704,161,611.776 and 9,001,000,000,000 bytes.

When those are converted to TiB they are are:

9.001 and 8.18636 TiB (respectively) which are both more than 8TiB.
> Is this 
> OST OK to use, or would we have receive an error message if it was not OK?
I''m not sure what the consequences of using an OST > 8TiB are to be
honest.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20081028/53cd7972/attachment.bin

Kossey, Robert

2008-Oct-28 15:01 UTC

head link

[Lustre-discuss] large ost and fsck

The original case raw size was 9.001TB(9,001,000,000,000 bytes), but the 
(ext3?) formatted size reported by lustre_config was 8584023MB, which is 
less than 8TiB.  However, we have tried another case with a raw size of 
11.0 TB, for which the formatted size is well above 8TiB:

root> lfs df -h
UUID                     bytes      Used Available  Use% Mounted on
testfs-MDT0000_UUID     717.7G    472.6M    676.2G    0% /testfs[MDT:0]
testfs-OST0000_UUID       9.8T    436.5M      9.3T    0% /testfs[OST:0]

filesystem summary:       9.8T    436.5M      9.3T    0% /testfs

So is it the case that > 8TiB is possible to create without errors, but 
remains unsupported?

Bob

 > >/ Using RHEL5U1, we have created an OST on a 9.001 TB
 > /
 > TiB or TB?  Respectively those amount to
 >
 > 9,896,704,161,611.776 and 9,001,000,000,000 bytes.
 >
 > When those are converted to TiB they are are:
 >
 > 9.001 and 8.18636 TiB (respectively) which are both more than 8TiB.
 >
 > >/ Is this
 > />/ OST OK to use, or would we have receive an error message
 > if it was not OK?
 > /
 > I''m not sure what the consequences of using an OST > 8TiB are
to be
 > honest.
 >
 > b.

Andreas Dilger

2008-Oct-28 22:14 UTC

head link

[Lustre-discuss] large ost and fsck

On Oct 21, 2008  12:00 -0400, Kossey, Robert wrote:> Can you be more specific about the 8TB limit?  Is it correct that this 
> is an 8TB base 2 (8796GB base 10) limit?
All filesystem limits are a result of base-2 issues.
> Does this limit apply to the raw device size or formatted Lustre OST 
> size?  Using RHEL5U1, we have created an OST on a 9.001 TB raw device 
> with a formatted size reported from lustre_config of 8584023MB.  Is this 
> OST OK to use, or would we have receive an error message if it was not OK?
For RHEL5 it is possible that a 9TB filesystem will work (I believe a fix
for this went into 1.6.6).  We just haven''t finished our testing on it
yet.  There is no chance that a >= 8TiB filesystem would work with SLES10
currently.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Andreas Dilger

2008-Oct-29 23:38 UTC

head link

[Lustre-discuss] large ost and fsck

On Oct 28, 2008  11:01 -0400, Kossey, Robert wrote:> The original case raw size was 9.001TB(9,001,000,000,000 bytes), but the 
> (ext3?) formatted size reported by lustre_config was 8584023MB, which is 
> less than 8TiB.  However, we have tried another case with a raw size of 
> 11.0 TB, for which the formatted size is well above 8TiB:
> 
> root> lfs df -h
> UUID                     bytes      Used Available  Use% Mounted on
> testfs-MDT0000_UUID     717.7G    472.6M    676.2G    0% /testfs[MDT:0]
> testfs-OST0000_UUID       9.8T    436.5M      9.3T    0% /testfs[OST:0]
> 
> filesystem summary:       9.8T    436.5M      9.3T    0% /testfs
> 
> So is it the case that > 8TiB is possible to create without errors, but 
> remains unsupported?
That is correct.  This may expose you to data corruption errors.  It may
not, we aren''t sure yet because we haven''t done testing beyond
8TB.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Lustre discuss - Oct 2008 - large ost and fsck

[Lustre-discuss] large ost and fsck

[Lustre-discuss] large ost and fsck

[Lustre-discuss] large ost and fsck

[Lustre-discuss] large ost and fsck

[Lustre-discuss] large ost and fsck

[Lustre-discuss] large ost and fsck

[Lustre-discuss] large ost and fsck

[Lustre-discuss] large ost and fsck