thr3ads.net - similar to: "Balancing I/O Load"

Displaying 20 results from an estimated 3000 matches similar to: "Balancing I/O Load"

2013 Apr 29

OSTs inactive on one client (only)

Hi everyone, I have seen this question here before, but without a very satisfactory answer. One of our half a dozen clients has lost access to a set of OSTs: > lfs osts OBDS:: 0: lustre-OST0000_UUID ACTIVE 1: lustre-OST0001_UUID ACTIVE 2: lustre-OST0002_UUID INACTIVE 3: lustre-OST0003_UUID INACTIVE 4: lustre-OST0004_UUID INACTIVE 5: lustre-OST0005_UUID ACTIVE 6: lustre-OST0006_UUID ACTIVE

Cannot send after transport endpoint shutdown (-108)

2008 Mar 04

Cannot send after transport endpoint shutdown (-108)

This morning I''ve had both my infiniband and tcp lustre clients hiccup. They are evicted from the server presumably as a result of their high load and consequent timeouts. My question is- why don''t the clients re-connect. The infiniband and tcp clients both give the following message when I type "df" - Cannot send after transport endpoint shutdown (-108). I''ve

NFS Performance

2008 Apr 15

NFS Performance

Hi, With help from Oleg we got the right patches applied and NFS working well. Maximum performance was about 60 MB/sec. Last week that dropped to about 12.5 MB/sec and I cannot find a reason. Lustre clients all obtain 100+ MB/sec on GigE. Each OST is good for 270 MB/sec. When mounting the client on one of the OSSs I get 230 MB/sec. Seems the speed is there. How can NFS and Lustre be tuned

How to track down a latency/timing problem

2010 Aug 12

How to track down a latency/timing problem

Hello Lustre Experts I am trying to solve a problem with very slow "ls" and other big amount of file operations but good overall read/write rates. We are running a small cluster of 3 OSSs with 9 OSTs, 1MDS (with SSD MDT) and currently two clients. All server nodes are centos 5.2 with lustre 1.8.1 while the clients are centos 5.4 with lustre 1.8.3. All components are networked with DDR

Large Corosync/Pacemaker clusters

2012 Oct 19

Large Corosync/Pacemaker clusters

Hi, We''re setting up fairly large Lustre 2.1.2 filesystems, each with 18 nodes and 159 resources all in one Corosync/Pacemaker cluster as suggested by our vendor. We''re getting mixed messages on how large of a Corosync/Pacemaker cluster will work well between our vendor an others. 1. Are there Lustre Corosync/Pacemaker clusters out there of this size or larger? 2.

How to remove OST permanently?

2007 Nov 23

How to remove OST permanently?

All, I''ve added a new 2.2 TB OST to my cluster easily enough, but this new disk array is meant to replace several smaller OSTs that I used to have of which were only 120 GB, 500 GB, and 700 GB. Adding an OST is easy, but how do I REMOVE the small OSTs that I no longer want to be part of my cluster? Is there a command to tell luster to move all the file stripes off one of the nodes?

Lustre drawback

2007 Dec 13

Lustre drawback

Hello everybody, at the following pages: http://www.rit.edu/~rc/docs/Survey_of_Clustered_Parallel_File_Systems_004_LANL.ppt http://www.intel.com/cd/ids/developer/asmo-na/eng/dc/tools/threading/238284.htm?page=2 I read: "[...] Currently, one additional drawback to Lustre is that a Lustre client cannot be on a server that is providing OSTs. This solution is being worked on and may be

bad 1.6.3 striped write performance

2007 Nov 26

bad 1.6.3 striped write performance

Hi, I''m seeing what can only be described as dismal striped write performance from lustre 1.6.3 clients :-/ 1.6.2 and 1.6.1 clients are fine. 1.6.4rc3 clients (from cvs a couple of days ago) are also terrible. the below shows that the OS (centos4.5/5) or fabric (gigE/IB) or lustre version on the servers doesn''t matter - the problem is with the 1.6.3 and 1.6.4rc3 client kernels

How to bypass failed OST without blocking?

2007 Mar 20

How to bypass failed OST without blocking?

Hi I want my lustre do such things during OST failed: if some file has stripe data on th failed OST, any operation on the file will return IO error without blocking, and also at this moment I can create and read/write new file or read/write files which have no stripe data on the failed OST without blocking. What should I do ? How to configure? thanks! swin -------------- next part

write RPC & congestion

2010 Aug 17

write RPC & congestion

Hi, thanks for previous help. I have some question about Lustre RPC and the sequence of events that occur during large concurrent write() involving many processes and large data size per process. I understand there is a mechanism of flow control by credits, but I''m a little unclear on how it works in general after reading the "networking & io protocol" white paper. Is

Multihomed question: want Lustre over IB andEthernet

2008 Mar 07

Multihomed question: want Lustre over IB andEthernet

Chris, Perhaps you need to perform some write_conf like command. I''m not sure if this is needed in 1.6 or not. Shane ----- Original Message ----- From: lustre-discuss-bounces at lists.lustre.org <lustre-discuss-bounces at lists.lustre.org> To: lustre-discuss <lustre-discuss at lists.lustre.org> Sent: Fri Mar 07 12:03:17 2008 Subject: Re: [Lustre-discuss] Multihomed

Lustre 1.0.2 packages available

2004 Jan 11

Lustre 1.0.2 packages available

Greetings-- Packages for Lustre 1.0.2 are now available in the usual place http://www.clusterfs.com/download.html This bug-fix release resolves a number of issues, of which a few are user-visible: - the default debug level is now a more reasonable production value - zero-copy TCP is now enabled by default, if your hardware supports it - you should encounter fewer allocation failures

Lustre 1.0.2 packages available

2004 Jan 11

Lustre 1.0.2 packages available

lustre-1.8 OSD

2007 Nov 30

lustre-1.8 OSD

lustre-1.8 has OSD structures in place, what do I need to add in to make it work with OSD T10 standard? could anybody point me to some docs mentioning lustre internals - OSTs, OSSs, OBDs, and control flow when a read/write call is invoked by a client. thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL:

fsck ldiskfs-backed OSTs?

2007 Oct 01

fsck ldiskfs-backed OSTs?

There are references to running fsck on the lustre OSTs after a crash or power failure. However, after downloading the ClusterFS e2fsprogs and building them, e2fsck does not recognize our ldiskfs- based OSTs. Is there a way to fsck the ldiskfs-based OSTs? Thanks, Charlie Taylor UF HPC Center

MPI-Blast + Lustre

2007 Dec 13

MPI-Blast + Lustre

Anyone have any experience with MpiBlast and Lustre. We have MpiBlast-1.4.0-pio and lustre-1.6.3 and we are seeing some pretty poor performance with most of the mpiblast threads spending 20% to 50% of their time in disk wait. We have the genbank nt database split into 24 fragments (one for each of our OSTs, 3 per OSS). The individual fragments are not striped due to the

Quota setup fails because of OST ordering

2008 Mar 03

Quota setup fails because of OST ordering

Hi all, after installing a Lustre test file system consisting of 34 OSTs, I encountered a strange error when trying to set up quotas: lfs quotacheck gave me an "Input/Output error", while in /var/log/kern.log I found a Lustre error LustreError: 20807:0:(quota_check.c:227:lov_quota_check()) lov idx 32 inactive Indeed, in /proc/fs/lustre/lov/.../target_obd all 34 OSTs were listed

Enable async journals

2010 Jul 13

Enable async journals

Hi all, we use SLES 11 and Lustre 1.8.1.1 + patches and like convert a lustre FS using external journals to one with async journals enabled. Question is whether the procedure: umount <filesystem> on all clients umount <osts> on all OSSes e2fsck <ost-device> on all OSSes for all all OSTs tune2fs -O ^has_journal <ost-device> on all

Understanding lustre setup ..

2013 Mar 11

Understanding lustre setup ..

Hello, I have been reading http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf for setting up Hadoop over lustre. Generally in hadoop setup, we have 1 Namenode and various number of datanodes. If I want to setup the same keeping Lustre as backend, in the document it is mentioned that: ".............Our experiments run on cluster with 8 nodes in total, one is mds/namenode, the rest are

How To change server recovery timeout

2007 Nov 07

How To change server recovery timeout

Hi, Our lustre environment is: 2.6.9-55.0.9.EL_lustre.1.6.3smp I would like to change recovery timeout from default value 250s to something longer I tried example from manual: set_timeout <secs> Sets the timeout (obd_timeout) for a server to wait before failing recovery. We performed that experiment on our test lustre installation with one OST. storage02 is our OSS [root at

similar to: Balancing I/O Load