thr3ads.net - Lustre discuss - [Lustre-discuss] Lustre Failover config [May 2006]

If this information is useful, please help other people find it:
Share via:

Evan Felix

2006-May-19 07:36 UTC

[Lustre-discuss] Lustre Failover config

Chris we addressed this same issue with out setup here at PNNL. In order
to deal with all the changes possible for failover/fibrechannel/chance
that can happen in the linux boot process we used
scsidev(http://www.garloff.de/kurt/linux/scsidev/) to map the drives to
a consistent place on the OST''s and MDS''s  this way the OST
will always
find his disk at /dev/scsi/Xdg1 on both OST''s in the failover pair. the
/dev/scsi directory could be named whatever you choose.

Evan

On Wed, 2004-03-31 at 20:32, Phil Schwan wrote:> Hi Chris--
> 
> Chris Samuel wrote:
> > 
> > A very quick question (as I''m logged in from home and I need
to go cook!).
> > 
> > If setting up failover MDS and active/active failover OST''s
is it
> > necessary to ensure that the device names match over nodes, i.e.
> > that /dev/sdb1 on node1 is /dev/sdb1 on node2, or is it permissible
> > to have /dev/sdc1 on node1 be the same partition as /dev/sdb1 on node2
?
> 
> This is no problem -- extremely common, in fact.
> 
> -Phil
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss@lists.clusterfs.com
> https://lists.clusterfs.com/mailman/listinfo/lustre-discuss-- 
-------------------------
Evan Felix
Administrator of Supercomputer #5 in Top 500, Nov 2003
Environmental Molecular Sciences Laboratory
Pacific Northwest National Laboratory
Operated for the U.S. DOE by Battelle

Phil Schwan

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Lustre Failover config

Hi Chris--

Chris Samuel wrote:> 
> A very quick question (as I''m logged in from home and I need to go
cook!).
> 
> If setting up failover MDS and active/active failover OST''s is it
> necessary to ensure that the device names match over nodes, i.e.
> that /dev/sdb1 on node1 is /dev/sdb1 on node2, or is it permissible
> to have /dev/sdc1 on node1 be the same partition as /dev/sdb1 on node2 ?
This is no problem -- extremely common, in fact.

-Phil

Chris Samuel

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Lustre Failover config

=2D----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, 6 Apr 2004 04:47 am, Evan Felix wrote:
> Chris we addressed this same issue with out setup here at PNNL
Thanks for the reply Evan.  How are you folks at PNNL finding Lustre ?

Here has been pretty bad. :-(

We''ve had to drop our attempts to experiment with it, the recommended
RPM=20
kernel that''s available seems very fragile, with a number of NFS
related=20
panics without any Lustre modules loaded on a system that used to be rock=20
solid and some unexplained crashes on other Lustre kernel nodes that seems=20
too coincidental..

My guess is that the heavy modifications of the kernel are creating unforseen=20
side effects that affects the stability of non-Lustre parts that may not be=20
being exercised properly in testing.

I also have strong reservations about the moderated nature of the mailing=20
list, certainly my previous email regarding NFS panics hasn''t made it
through=20
to the list.  Not a good model..

I''m afraid we''re probably going to be steering clear of it now
for some time,=20
once bitten twice shy.

=2D --=20
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

=2D----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAcgMEO2KABBYQAh8RAmASAJ9dO0MGcdoRh0jaOlrAoa1f+iJ6ygCfT2S9
wZ/9GnYAC/lpbPZTjbNQCic=3D
=3DQnp6
=2D----END PGP SIGNATURE-----

Phil Schwan

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Lustre Failover config

Hi Chris--

Chris Samuel wrote:> 
> Thanks for the reply Evan.  How are you folks at PNNL finding Lustre ?
> 
> Here has been pretty bad. :-(
> 
> We''ve had to drop our attempts to experiment with it, the
recommended RPM
> kernel that''s available seems very fragile, with a number of NFS
related
> panics without any Lustre modules loaded on a system that used to be rock 
> solid and some unexplained crashes on other Lustre kernel nodes that seems 
> too coincidental..
> 
> My guess is that the heavy modifications of the kernel are creating
unforseen
> side effects that affects the stability of non-Lustre parts that may not be
> being exercised properly in testing.
I''m sorry to hear that you''re having trouble; although we use
the Lustre
kernels in very NFS-intensive environments, they are almost exclusively
used as NFS clients, not NFS servers.  You are correct that the NFS
server in the patched kernel is not tested well enough, and I will make
sure that our test suite grows to include it.

In the short term, it is a simple matter to remove those NFS server
patches while we resolve the issues; those patches were not written by
CFS, in fact, but by including them we certainly take responsibility for
them.

If you have not given up entirely, I''m happy to upload a new set of
RPMs
without patches to the NFS server.  The only fallout will be that you
won''t be able to re-export a Lustre file system via NFS, but I suspect
that you were not planning to do that anyways.

If you have other reproducible crashes, I hope you will share them with
us.  We have not received many bug reports for Lustre 1.0.4.
> I also have strong reservations about the moderated nature of the mailing 
> list, certainly my previous email regarding NFS panics hasn''t made
it through
> to the list.  Not a good model..
The mailing list is a service to the Lustre community, through which we
attempt to provide free advice.  While we want to encourage a vibrant
Lustre user group, I also need to make sure that we remain viable.  We
have no expensive hardware or license fees to sell, only our time, and
that means putting our paying customers first.

That all being said, I can certainly understand if lustre-discuss is not
meeting your technical support needs.  It has been three business days
since you wrote that email, and sometimes we only get around to cleaning
house on lustre-discuss once per week.  If you require faster
turnaround, we have a support option which I think is priced very
inexpensively.

As I believe Evan will confirm, our customers are well cared for, and he
knows precisely whose cage to rattle if something is not being addressed
in a timely fashion.
> I''m afraid we''re probably going to be steering clear of
it now for some time,
> once bitten twice shy.
I can understand being cautious.  But if you change your mind, we''d
love
to help resolve your issues, even if it takes slightly longer than you
had hoped.

Thanks--

-Phil

Evan Felix

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Lustre Failover config

On Mon, 2004-04-05 at 18:08, Chris Samuel wrote:> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Tue, 6 Apr 2004 04:47 am, Evan Felix wrote:
> 
> > Chris we addressed this same issue with out setup here at PNNL
> 
> Thanks for the reply Evan.  How are you folks at PNNL finding Lustre ?
> We have been pretty happy here.  I personally worked very closely with
early beta code and got to know lustre pretty well about a year ago. 
With the help of CFS & HP(our system vendor) we built a very stable
version to put on our system as it went into production last july. 
Since that time we have had 4 major lustre filesystem issues, and all of
them related to hardware failure on the servers.
> Here has been pretty bad. :-(
> 
> We''ve had to drop our attempts to experiment with it, the
recommended
> RPM 
> kernel that''s available seems very fragile, with a number of NFS
> related 
> panics without any Lustre modules loaded on a system that used to be
> rock 
> solid and some unexplained crashes on other Lustre kernel nodes that
> seems 
> too coincidental..
> We have never run with any NFS lustre patches...
> My guess is that the heavy modifications of the kernel are creating
> unforseen 
> side effects that affects the stability of non-Lustre parts that may
> not be 
> being exercised properly in testing.
> 
> I also have strong reservations about the moderated nature of the
> mailing 
> list, certainly my previous email regarding NFS panics hasn''t made
it
> through 
> to the list.  Not a good model..I dont like it much either.  But i dont post here much.  The IRC channel
seem to be much more responsive. and as phil stated in another e-mail, i
can pretty much find someone watching the IRC channel 24 hours a day.
> 
> I''m afraid we''re probably going to be steering clear of
it now for
> some time, 
> once bitten twice shy.
We have been happy with lustre, recently i needed to move 8T off
somewhere in a few days, and it took me about 3 hours to
create/build/deploy a 13Terabyte Lustre filesystem with 6 3.5T IDE based
storage bricks.  I used the stock 1.0.4 rpm kernels from the web site. 
once up it has worked for 4 weeks now.  I''m tearing it down today, but
its worked very well.

Evan

Chris Samuel

2006-May-19 07:36 UTC

head link

[Lustre-discuss] Lustre Failover config

Hi all,

A very quick question (as I''m logged in from home and I need to go
cook!).

If setting up failover MDS and active/active failover OST''s is it
necessary to ensure that the device names match over nodes, i.e.
that /dev/sdb1 on node1 is /dev/sdb1 on node2, or is it permissible
to have /dev/sdc1 on node1 be the same partition as /dev/sdb1 on node2 ?

cheers!
Chris
-- 
 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

Lustre discuss - May 2006 - Lustre Failover config

[Lustre-discuss] Lustre Failover config

[Lustre-discuss] Lustre Failover config

[Lustre-discuss] Lustre Failover config

[Lustre-discuss] Lustre Failover config

[Lustre-discuss] Lustre Failover config

[Lustre-discuss] Lustre Failover config