Displaying 20 results from an estimated 10000 matches similar to: "best practice for lustre clustre startup"
2010 Sep 09
1
What's the correct sequence to umount multiple lustre file system
Any recommendation about the sequence to umount multiple lustre file system with combined MGS/MDT or separate MGS, MDT. Thanks.
Ming
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20100909/396905b5/attachment.html
2008 Mar 07
2
Multihomed question: want Lustre over IB andEthernet
Chris,
Perhaps you need to perform some write_conf like command. I''m not sure if this is needed in 1.6 or not.
Shane
----- Original Message -----
From: lustre-discuss-bounces at lists.lustre.org <lustre-discuss-bounces at lists.lustre.org>
To: lustre-discuss <lustre-discuss at lists.lustre.org>
Sent: Fri Mar 07 12:03:17 2008
Subject: Re: [Lustre-discuss] Multihomed
2013 Dec 17
2
Setting up a lustre zfs dual mgs/mdt over tcp - help requested
Hi all,
Here is the situation:
I have 2 nodes MDS1 , MDS2 (10.0.0.22 , 10.0.0.23) I wish to use as
failover MGS, active/active MDT with zfs.
I have a jbod shelf with 12 disks, seen by both nodes as das (the
shelf has 2 sas ports, connected to a sas hba on each node), and I
am using lustre 2.4 on centos 6.4 x64
I have created 3 zfs pools:
1. mgs:
# zpool
2007 Nov 07
9
How To change server recovery timeout
Hi,
Our lustre environment is:
2.6.9-55.0.9.EL_lustre.1.6.3smp
I would like to change recovery timeout from default value 250s to
something longer
I tried example from manual:
set_timeout <secs> Sets the timeout (obd_timeout) for a server
to wait before failing recovery.
We performed that experiment on our test lustre installation with one
OST.
storage02 is our OSS
[root at
2007 Oct 15
3
iptables rules for lustre 1.6.x and MGS recovery procedures
Hi,
I would like to know what TCP/UDP ports should i keep open in my
firewall policies on my MGS server such that I can have my MGS server
fire-walled. Also if in a event of loss of MGT would it be possible
to recreate the MGT without loosing data or bringing the filesystem
down (i.e. by using cached information from MDT''s and OST''s)
Thanks
Anand
2007 Nov 19
6
Dedicated MGS?
This may be in the documentation. If so, I missed it.
If a site has multiple Lustre file systems, the documentation implies
that there only needs to be a single MGS for an entire site
(regardless of the number of file systems). However, I also know
it is fairly common to have a combined MGS/MDT. So here are the
questions.
1. If we are going to have several Lustre file systems,
2013 Feb 12
2
Lost folders after changing MDS
OK, so our old MDS had hardware issues so I configured a new MGS / MDS on a VM (this is a backup lustre filesystem and I wanted to separate the MGS / MDS from OSS of the previous), and then did this:
For example:
mount -t ldiskfs /dev/old /mnt/ost_old
mount -t ldiskfs /dev/new /mnt/ost_new
rsync -aSv /mnt/ost_old/ /mnt/ost_new
# note trailing slash on ost_old/
If you are unable to connect both
2007 Nov 23
2
How to remove OST permanently?
All,
I''ve added a new 2.2 TB OST to my cluster easily enough, but this new
disk array is meant to replace several smaller OSTs that I used to have
of which were only 120 GB, 500 GB, and 700 GB.
Adding an OST is easy, but how do I REMOVE the small OSTs that I no
longer want to be part of my cluster? Is there a command to tell luster
to move all the file stripes off one of the nodes?
2013 Oct 17
3
Speeding up configuration log regeneration?
Hi,
We run four-node Lustre 2.3, and I needed to both change hardware
under MGS/MDS and reassign an OSS ip. Just the same, I added a brand
new 10GE network to the system, which was the reason for MDS hardware
change.
I ran tunefs.lustre --writeconf as per chapter 14.4 in Lustre Manual,
and everything mounts fine. Log regeneration apparently works, since
it seems to do something, but
2007 Nov 12
8
More failover issues
In 1.6.0, when creating a MDT, you could specify multiple --mgsnode options
and it would failover between them. 1.6.3 only seems to take the last one
and --mgsnode=192.168.1.252 at o2ib:192.168.1.253 at o2ib doesn''t seem to failover
to the other node. Any ideas how to get around this?
Robert
Robert LeBlanc
College of Life Sciences Computer Support
Brigham Young University
leblanc at
2012 Nov 02
3
lctl ping of Pacemaker IP
Greetings!
I am working with Lustre-2.1.2 on RHEL 6.2. First I configured it
using the standard defaults over TCP/IP. Everything worked very
nicely usnig a real, static --mgsnode=a.b.c.x value which was the
actual IP of the MGS/MDS system1 node.
I am now trying to integrate it with Pacemaker-1.1.7. I believe I
have most of the set-up completed with a particular exception. The
"lctl
2007 Mar 20
15
How to bypass failed OST without blocking?
Hi
I want my lustre do such things during OST failed: if some file
has stripe data on th failed OST, any operation on the file will
return IO error without blocking, and also at this moment I can
create and read/write new file or read/write files which have no stripe
data on the failed OST without blocking.
What should I do ? How to configure?
thanks!
swin
-------------- next part
2013 Mar 11
4
Understanding lustre setup ..
Hello,
I have been reading
http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf for setting up
Hadoop over lustre.
Generally in hadoop setup, we have 1 Namenode and various number of datanodes.
If I want to setup the same keeping Lustre as backend, in the document
it is mentioned that:
".............Our experiments run on cluster with 8 nodes in total,
one is mds/namenode, the rest are
2008 Feb 04
32
Luster clients getting evicted
on our cluster that has been running lustre for about 1 month. I have
1 MDT/MGS and 1 OSS with 2 OST''s.
Our cluster uses all Gige and has about 608 nodes 1854 cores.
We have allot of jobs that die, and/or go into high IO wait, strace
shows processes stuck in fstat().
The big problem is (i think) I would like some feedback on it that of
these 608 nodes 209 of them have in dmesg
2008 Jan 02
9
lustre quota problems
Hello,
I''ve several problems with quota on our testcluster:
When I set the quota for a person to a given value (e.g. the values which
are provided in the operations manual), I''m able to write exact the amount
which is set with setquota.
But when I delete the files(file) I''m not able to use this space again.
Here is what I''ve done in detail:
lfs checkquota
2008 Feb 05
2
obdfilter/datafs-OST0000/recovery_status
I''m evaluating lustre. I''m trying what I think is a basic/simple
ethernet config. with MDT and OST on the same node. Can someone tell
me if the following (~150 second recovery occurring when small 190 GB
OST is re-mounted) is expected behavior or if I''m missing something?
I thought I would send this and continue with the eval while awaiting
a
response.
I''m using
2013 Mar 18
1
lustre showing inactive devices
I installed 1 MDS , 2 OSS/OST and 2 Lustre Client. My MDS shows:
[code]
[root at MDS ~]# lctl list_nids
10.94.214.185 at tcp
[root at MDS ~]#
[/code]
On Lustre Client1:
[code]
[root at lustreclient1 lustre]# lfs df -h
UUID bytes Used Available Use% Mounted on
lustre-MDT0000_UUID 4.5G 274.3M 3.9G 6%
/mnt/lustre[MDT:0]
2008 Jan 10
4
1.6.4.1 - active client evicted
Hi!
We''ve started to poke and prod at Lustre 1.6.4.1, and it seems to
mostly work (we haven''t had it OOPS on us yet like the earlier
1.6-versions did).
However, we had this weird incident where an active client (it was
copying 4GB files and running ls at the time) got evicted by the MDS
and all OST''s. After a while logs indicate that it did recover the
connection
2008 Feb 22
6
2.6.23 client systems with any compatible server
I want to have a lustre client running on a system with 2.6.23.12
kernel. (The reason is that there is a special patch that is required
for these 60+ Quad-Core AMD Opteron systems that we have and the patch
is currently only available for this 2.6.23.12 kernel).
Does anyone have a recommendation of how I should get a client and
then a compatible server?
For the server, we only need minimal
2007 Nov 16
5
Lustre Debug level
Hi,
Lustre manual 1.6 v18 says that that in production lustre debug level
should be set to fairly low. Manual also says that I can verify that
level by running following commands:
# sysctl portals.debug
This gives ne following error
error: ''portals.debug'' is an unknown key
cat /proc/sys/lnet/debug
gives output:
ioctl neterror warning error emerg ha config console
cat