On Mar 11, 2005 00:31 +0800, Carlos Ko wrote:> My question is:
>=20
> When a file / files striped over multiple OOS, and if one of them are
down=20
> and is there any recovery for that file/ files? Is Lustre possible to=20
> recover data? If yes, any hits?
You are confusing two separate concepts here.
> I read this at http://www.clusterfs.com/compare.html:
>=20
> Multi-server file striping
> Break through traditional file system limits by striping a single file
over=20
> multiple object storage servers. Striped over multiple servers, users
can=20
> create files as large as 16 TB on 32-bit platforms, and nearly unlimited
on=20
> 64-bit platforms.
>=20
> These striped files also harness the network bandwidth of many servers,=20
> allowing tremendous throughput when reading or writing a single file!
The Lustre file "striping" is the same as RAID 0 - a single copy
of each piece of data is put onto a single storage target (OST).
Different parts of the file may be put onto separate OSTs, depending
on the striping pattern you select. This does nothing to protect your
data at all. If an OST goes down and you do not have it running in
"recoverable" mode you will get IO errors reading from the parts of
the
file that are on the down OST. If the disk(s) where the OST is storing
data go bad then your data is lost. There is no support from Lustre to
handle OST data redundancy (e.g. RAID 1) at this time.
HOWEVER, Lustre works just fine on top of hardware and software RAID 1
or RAID 5 storage devices. That means if the backing storage for the
Lustre OST is itself redundant (MD RAID, RAID adapter, or external
RAID device) then the data will not be lost if a disk fails.
> I read this page for Recovery of Lustre,
>=20
> https://wiki.clusterfs.com/lustre/RecoveryOverview
>=20
> It mainly discussed:
>=20
> - client (compute node) failure
> - MDS failure (and failover)
> - OST failure
> - transient network partition
The "recovery" being discussed here enables an MDS or OST service to
be
restarted on the same node (or a different node if the storage hardware
supports it) if the node crashes or there are network problems. This
allows Lustre to handle MDS and (potentially, depending on configuration)
OST failures without the application noticing. This is recovery of the
SERVICE and has nothing to do with recovery of the DATA (which needs
RAID 1 or RAID 5 storage devices as mentioned previously).
Cheers, Andreas
--
Andreas Dilger