Displaying 20 results from an estimated 3000 matches similar to: "Balancing I/O Load"
2013 Apr 29
1
OSTs inactive on one client (only)
Hi everyone,
I have seen this question here before, but without a very
satisfactory answer. One of our half a dozen clients has
lost access to a set of OSTs:
> lfs osts
OBDS::
0: lustre-OST0000_UUID ACTIVE
1: lustre-OST0001_UUID ACTIVE
2: lustre-OST0002_UUID INACTIVE
3: lustre-OST0003_UUID INACTIVE
4: lustre-OST0004_UUID INACTIVE
5: lustre-OST0005_UUID ACTIVE
6: lustre-OST0006_UUID ACTIVE
2008 Mar 04
16
Cannot send after transport endpoint shutdown (-108)
This morning I''ve had both my infiniband and tcp lustre clients hiccup. They are evicted from the server presumably as a result of their high load and consequent timeouts. My question is- why don''t the clients re-connect. The infiniband and tcp clients both give the following message when I type "df" - Cannot send after transport endpoint shutdown (-108). I''ve
2008 Apr 15
4
NFS Performance
Hi,
With help from Oleg we got the right patches applied and NFS working
well. Maximum performance was about 60 MB/sec. Last week that dropped
to about 12.5 MB/sec and I cannot find a reason. Lustre clients all
obtain 100+ MB/sec on GigE. Each OST is good for 270 MB/sec. When
mounting the client on one of the OSSs I get 230 MB/sec. Seems the
speed is there. How can NFS and Lustre be tuned
2010 Aug 12
3
How to track down a latency/timing problem
Hello Lustre Experts
I am trying to solve a problem with very slow "ls" and other big amount
of file operations but good overall read/write rates.
We are running a small cluster of 3 OSSs with 9 OSTs, 1MDS (with SSD
MDT) and currently two clients. All server nodes are centos 5.2 with
lustre 1.8.1 while the clients are centos 5.4 with lustre 1.8.3. All
components are networked with DDR
2012 Oct 19
6
Large Corosync/Pacemaker clusters
Hi,
We''re setting up fairly large Lustre 2.1.2 filesystems, each with 18
nodes and 159 resources all in one Corosync/Pacemaker cluster as
suggested by our vendor. We''re getting mixed messages on how large of a
Corosync/Pacemaker cluster will work well between our vendor an others.
1. Are there Lustre Corosync/Pacemaker clusters out there of this
size or larger?
2.
2007 Nov 23
2
How to remove OST permanently?
All,
I''ve added a new 2.2 TB OST to my cluster easily enough, but this new
disk array is meant to replace several smaller OSTs that I used to have
of which were only 120 GB, 500 GB, and 700 GB.
Adding an OST is easy, but how do I REMOVE the small OSTs that I no
longer want to be part of my cluster? Is there a command to tell luster
to move all the file stripes off one of the nodes?
2007 Dec 13
4
Lustre drawback
Hello everybody,
at the following pages:
http://www.rit.edu/~rc/docs/Survey_of_Clustered_Parallel_File_Systems_004_LANL.ppt
http://www.intel.com/cd/ids/developer/asmo-na/eng/dc/tools/threading/238284.htm?page=2
I read:
"[...] Currently, one additional drawback to Lustre is that a Lustre
client cannot be on a server that is providing OSTs. This solution is
being worked on and may be
2007 Nov 26
15
bad 1.6.3 striped write performance
Hi,
I''m seeing what can only be described as dismal striped write
performance from lustre 1.6.3 clients :-/
1.6.2 and 1.6.1 clients are fine. 1.6.4rc3 clients (from cvs a couple
of days ago) are also terrible.
the below shows that the OS (centos4.5/5) or fabric (gigE/IB) or lustre
version on the servers doesn''t matter - the problem is with the 1.6.3
and 1.6.4rc3 client kernels
2007 Mar 20
15
How to bypass failed OST without blocking?
Hi
I want my lustre do such things during OST failed: if some file
has stripe data on th failed OST, any operation on the file will
return IO error without blocking, and also at this moment I can
create and read/write new file or read/write files which have no stripe
data on the failed OST without blocking.
What should I do ? How to configure?
thanks!
swin
-------------- next part
2010 Aug 17
18
write RPC & congestion
Hi, thanks for previous help.
I have some question about Lustre RPC and the sequence of events that
occur during large concurrent write() involving many processes and large
data size per process. I understand there is a mechanism of flow
control by credits, but I''m a little unclear on how it works in general
after reading the "networking & io protocol" white paper.
Is
2008 Mar 07
2
Multihomed question: want Lustre over IB andEthernet
Chris,
Perhaps you need to perform some write_conf like command. I''m not sure if this is needed in 1.6 or not.
Shane
----- Original Message -----
From: lustre-discuss-bounces at lists.lustre.org <lustre-discuss-bounces at lists.lustre.org>
To: lustre-discuss <lustre-discuss at lists.lustre.org>
Sent: Fri Mar 07 12:03:17 2008
Subject: Re: [Lustre-discuss] Multihomed
2004 Jan 11
3
Lustre 1.0.2 packages available
Greetings--
Packages for Lustre 1.0.2 are now available in the usual place
http://www.clusterfs.com/download.html
This bug-fix release resolves a number of issues, of which a few are
user-visible:
- the default debug level is now a more reasonable production value
- zero-copy TCP is now enabled by default, if your hardware supports it
- you should encounter fewer allocation failures
2004 Jan 11
3
Lustre 1.0.2 packages available
Greetings--
Packages for Lustre 1.0.2 are now available in the usual place
http://www.clusterfs.com/download.html
This bug-fix release resolves a number of issues, of which a few are
user-visible:
- the default debug level is now a more reasonable production value
- zero-copy TCP is now enabled by default, if your hardware supports it
- you should encounter fewer allocation failures
2007 Nov 30
1
lustre-1.8 OSD
lustre-1.8 has OSD structures in place, what do I need to add in to make it
work with OSD T10 standard? could anybody point me to some docs mentioning
lustre internals - OSTs, OSSs, OBDs, and control flow when a read/write call
is invoked by a client. thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2007 Oct 01
1
fsck ldiskfs-backed OSTs?
There are references to running fsck on the lustre OSTs after a crash
or power failure. However, after downloading the ClusterFS
e2fsprogs and building them, e2fsck does not recognize our ldiskfs-
based OSTs. Is there a way to fsck the ldiskfs-based OSTs?
Thanks,
Charlie Taylor
UF HPC Center
2007 Dec 13
1
MPI-Blast + Lustre
Anyone have any experience with MpiBlast and Lustre. We have
MpiBlast-1.4.0-pio and lustre-1.6.3 and we are seeing some pretty
poor performance with most of the mpiblast threads spending 20% to
50% of their time in disk wait. We have the genbank nt database
split into 24 fragments (one for each of our OSTs, 3 per OSS). The
individual fragments are not striped due to the
2008 Mar 03
1
Quota setup fails because of OST ordering
Hi all,
after installing a Lustre test file system consisting of 34 OSTs, I
encountered a strange error when trying to set up quotas:
lfs quotacheck gave me an "Input/Output error", while in
/var/log/kern.log I found a Lustre error
LustreError: 20807:0:(quota_check.c:227:lov_quota_check()) lov idx 32
inactive
Indeed, in /proc/fs/lustre/lov/.../target_obd all 34 OSTs were listed
2010 Jul 13
4
Enable async journals
Hi all,
we use SLES 11 and Lustre 1.8.1.1 + patches and like convert a lustre FS
using external journals to one with async journals enabled.
Question is whether the procedure:
umount <filesystem> on all clients
umount <osts> on all OSSes
e2fsck <ost-device> on all OSSes for all all OSTs
tune2fs -O ^has_journal <ost-device> on all
2013 Mar 11
4
Understanding lustre setup ..
Hello,
I have been reading
http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf for setting up
Hadoop over lustre.
Generally in hadoop setup, we have 1 Namenode and various number of datanodes.
If I want to setup the same keeping Lustre as backend, in the document
it is mentioned that:
".............Our experiments run on cluster with 8 nodes in total,
one is mds/namenode, the rest are
2007 Nov 07
9
How To change server recovery timeout
Hi,
Our lustre environment is:
2.6.9-55.0.9.EL_lustre.1.6.3smp
I would like to change recovery timeout from default value 250s to
something longer
I tried example from manual:
set_timeout <secs> Sets the timeout (obd_timeout) for a server
to wait before failing recovery.
We performed that experiment on our test lustre installation with one
OST.
storage02 is our OSS
[root at