similar to: How to track down a latency/timing problem

Displaying 20 results from an estimated 3000 matches similar to: "How to track down a latency/timing problem"

2007 Nov 29
2
Balancing I/O Load
We are seeing some disturbing (probably due to our ignorance) behavior from lustre 1.6.3 right now. We have 8 OSSs with 3 OSTs per OSS (24 physical LUNs). We just created a brand new lustre file system across this configuration using the default mkfs.lustre formatting options. We have this file system mounted across 400 clients. At the moment, we have 63 IOzone threads running
2008 Apr 15
4
NFS Performance
Hi, With help from Oleg we got the right patches applied and NFS working well. Maximum performance was about 60 MB/sec. Last week that dropped to about 12.5 MB/sec and I cannot find a reason. Lustre clients all obtain 100+ MB/sec on GigE. Each OST is good for 270 MB/sec. When mounting the client on one of the OSSs I get 230 MB/sec. Seems the speed is there. How can NFS and Lustre be tuned
2007 Nov 26
15
bad 1.6.3 striped write performance
Hi, I''m seeing what can only be described as dismal striped write performance from lustre 1.6.3 clients :-/ 1.6.2 and 1.6.1 clients are fine. 1.6.4rc3 clients (from cvs a couple of days ago) are also terrible. the below shows that the OS (centos4.5/5) or fabric (gigE/IB) or lustre version on the servers doesn''t matter - the problem is with the 1.6.3 and 1.6.4rc3 client kernels
2012 Oct 19
6
Large Corosync/Pacemaker clusters
Hi, We''re setting up fairly large Lustre 2.1.2 filesystems, each with 18 nodes and 159 resources all in one Corosync/Pacemaker cluster as suggested by our vendor. We''re getting mixed messages on how large of a Corosync/Pacemaker cluster will work well between our vendor an others. 1. Are there Lustre Corosync/Pacemaker clusters out there of this size or larger? 2.
2013 Apr 29
1
OSTs inactive on one client (only)
Hi everyone, I have seen this question here before, but without a very satisfactory answer. One of our half a dozen clients has lost access to a set of OSTs: > lfs osts OBDS:: 0: lustre-OST0000_UUID ACTIVE 1: lustre-OST0001_UUID ACTIVE 2: lustre-OST0002_UUID INACTIVE 3: lustre-OST0003_UUID INACTIVE 4: lustre-OST0004_UUID INACTIVE 5: lustre-OST0005_UUID ACTIVE 6: lustre-OST0006_UUID ACTIVE
2007 Nov 30
1
lustre-1.8 OSD
lustre-1.8 has OSD structures in place, what do I need to add in to make it work with OSD T10 standard? could anybody point me to some docs mentioning lustre internals - OSTs, OSSs, OBDs, and control flow when a read/write call is invoked by a client. thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL:
2010 Jun 22
7
lnet infiniband config
Hi all, I''m getting my feet wet in the infiniband lake and of course I run into some problems. It would seem I got the compilation part of sles11 kernel 2.6.27 + Lustre 1.8.3 + ofed 1.4.2 right, because it allows me to see and use the infiniband fabric, and because ko2iblnd loads without any complaints. In /etc/modprobe.d/lustre (this is a Debian system, hence this subdir of
2010 Aug 17
18
write RPC & congestion
Hi, thanks for previous help. I have some question about Lustre RPC and the sequence of events that occur during large concurrent write() involving many processes and large data size per process. I understand there is a mechanism of flow control by credits, but I''m a little unclear on how it works in general after reading the "networking & io protocol" white paper. Is
2010 Sep 30
1
ldiskfs-ext4 interoperability question
Our current Lustre servers run the version 1.8.1.1 with the regular ldiskfs. We are looking to expand our Lustre file system with new servers/storage and upgrade to all the lustre servers to 1.8.4 as well at the same time. We would like to make use of the ldiskfs-ext4 on the new servers to use larger OSTs. I just want to confirm the following facts: 1. Is is possible to run different versions
2008 Mar 07
2
Multihomed question: want Lustre over IB andEthernet
Chris, Perhaps you need to perform some write_conf like command. I''m not sure if this is needed in 1.6 or not. Shane ----- Original Message ----- From: lustre-discuss-bounces at lists.lustre.org <lustre-discuss-bounces at lists.lustre.org> To: lustre-discuss <lustre-discuss at lists.lustre.org> Sent: Fri Mar 07 12:03:17 2008 Subject: Re: [Lustre-discuss] Multihomed
2008 Mar 04
16
Cannot send after transport endpoint shutdown (-108)
This morning I''ve had both my infiniband and tcp lustre clients hiccup. They are evicted from the server presumably as a result of their high load and consequent timeouts. My question is- why don''t the clients re-connect. The infiniband and tcp clients both give the following message when I type "df" - Cannot send after transport endpoint shutdown (-108). I''ve
2010 Jul 13
4
Enable async journals
Hi all, we use SLES 11 and Lustre 1.8.1.1 + patches and like convert a lustre FS using external journals to one with async journals enabled. Question is whether the procedure: umount <filesystem> on all clients umount <osts> on all OSSes e2fsck <ost-device> on all OSSes for all all OSTs tune2fs -O ^has_journal <ost-device> on all
2010 Jul 05
4
Adding OST to online Lustre with quota
Hello, we wounder whether is it possible to add OSTs to the Lustre with quota support without making it offline? We tried to do this but all quota information was lost. Despite the fact that OST was formatted with quota support we are receiving this error message: Lustre: 3743:0:(lproc_quota.c:447:lprocfs_quota_wr_type()) lustrefs-OST0016: quotaon failed because quota files
2010 Jul 08
5
No space left on device on not full filesystem
Hello, We have running lustre 1.8.1 and have met "No space lest on device" error when uploading 500 Gb small files (less then 100 Kb each). The problem seems to depends on the number of files. If we remove one file, we can create one new file, even with Gb size; but if we haven''t remove something we can''t create even very little file, as an example using touch
2007 Dec 13
4
Lustre drawback
Hello everybody, at the following pages: http://www.rit.edu/~rc/docs/Survey_of_Clustered_Parallel_File_Systems_004_LANL.ppt http://www.intel.com/cd/ids/developer/asmo-na/eng/dc/tools/threading/238284.htm?page=2 I read: "[...] Currently, one additional drawback to Lustre is that a Lustre client cannot be on a server that is providing OSTs. This solution is being worked on and may be
2008 Mar 03
1
Quota setup fails because of OST ordering
Hi all, after installing a Lustre test file system consisting of 34 OSTs, I encountered a strange error when trying to set up quotas: lfs quotacheck gave me an "Input/Output error", while in /var/log/kern.log I found a Lustre error LustreError: 20807:0:(quota_check.c:227:lov_quota_check()) lov idx 32 inactive Indeed, in /proc/fs/lustre/lov/.../target_obd all 34 OSTs were listed
2013 Mar 11
4
Understanding lustre setup ..
Hello, I have been reading http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf for setting up Hadoop over lustre. Generally in hadoop setup, we have 1 Namenode and various number of datanodes. If I want to setup the same keeping Lustre as backend, in the document it is mentioned that: ".............Our experiments run on cluster with 8 nodes in total, one is mds/namenode, the rest are
2012 Sep 27
4
Bad reporting inodes free
Hello, When I run a "df -i" in my clients I get 95% indes used or 5% inodes free: Filesystem Inodes IUsed IFree IUse% Mounted on lustre-mds-01:lustre-mds-02:/cetafs 22200087 20949839 1250248 95% /mnt/data But if I run lfs df -i i get: UUID Inodes IUsed IFree I
2004 Jan 11
3
Lustre 1.0.2 packages available
Greetings-- Packages for Lustre 1.0.2 are now available in the usual place http://www.clusterfs.com/download.html This bug-fix release resolves a number of issues, of which a few are user-visible: - the default debug level is now a more reasonable production value - zero-copy TCP is now enabled by default, if your hardware supports it - you should encounter fewer allocation failures
2004 Jan 11
3
Lustre 1.0.2 packages available
Greetings-- Packages for Lustre 1.0.2 are now available in the usual place http://www.clusterfs.com/download.html This bug-fix release resolves a number of issues, of which a few are user-visible: - the default debug level is now a more reasonable production value - zero-copy TCP is now enabled by default, if your hardware supports it - you should encounter fewer allocation failures