Chris we addressed this same issue with out setup here at PNNL. In order to deal with all the changes possible for failover/fibrechannel/chance that can happen in the linux boot process we used scsidev(http://www.garloff.de/kurt/linux/scsidev/) to map the drives to a consistent place on the OST''s and MDS''s this way the OST will always find his disk at /dev/scsi/Xdg1 on both OST''s in the failover pair. the /dev/scsi directory could be named whatever you choose. Evan On Wed, 2004-03-31 at 20:32, Phil Schwan wrote:> Hi Chris-- > > Chris Samuel wrote: > > > > A very quick question (as I''m logged in from home and I need to go cook!). > > > > If setting up failover MDS and active/active failover OST''s is it > > necessary to ensure that the device names match over nodes, i.e. > > that /dev/sdb1 on node1 is /dev/sdb1 on node2, or is it permissible > > to have /dev/sdc1 on node1 be the same partition as /dev/sdb1 on node2 ? > > This is no problem -- extremely common, in fact. > > -Phil > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.clusterfs.com > https://lists.clusterfs.com/mailman/listinfo/lustre-discuss-- ------------------------- Evan Felix Administrator of Supercomputer #5 in Top 500, Nov 2003 Environmental Molecular Sciences Laboratory Pacific Northwest National Laboratory Operated for the U.S. DOE by Battelle
Hi Chris-- Chris Samuel wrote:> > A very quick question (as I''m logged in from home and I need to go cook!). > > If setting up failover MDS and active/active failover OST''s is it > necessary to ensure that the device names match over nodes, i.e. > that /dev/sdb1 on node1 is /dev/sdb1 on node2, or is it permissible > to have /dev/sdc1 on node1 be the same partition as /dev/sdb1 on node2 ?This is no problem -- extremely common, in fact. -Phil
=2D----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tue, 6 Apr 2004 04:47 am, Evan Felix wrote:> Chris we addressed this same issue with out setup here at PNNLThanks for the reply Evan. How are you folks at PNNL finding Lustre ? Here has been pretty bad. :-( We''ve had to drop our attempts to experiment with it, the recommended RPM=20 kernel that''s available seems very fragile, with a number of NFS related=20 panics without any Lustre modules loaded on a system that used to be rock=20 solid and some unexplained crashes on other Lustre kernel nodes that seems=20 too coincidental.. My guess is that the heavy modifications of the kernel are creating unforseen=20 side effects that affects the stability of non-Lustre parts that may not be=20 being exercised properly in testing. I also have strong reservations about the moderated nature of the mailing=20 list, certainly my previous email regarding NFS panics hasn''t made it through=20 to the list. Not a good model.. I''m afraid we''re probably going to be steering clear of it now for some time,=20 once bitten twice shy. =2D --=20 Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia =2D----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQFAcgMEO2KABBYQAh8RAmASAJ9dO0MGcdoRh0jaOlrAoa1f+iJ6ygCfT2S9 wZ/9GnYAC/lpbPZTjbNQCic=3D =3DQnp6 =2D----END PGP SIGNATURE-----
Hi Chris-- Chris Samuel wrote:> > Thanks for the reply Evan. How are you folks at PNNL finding Lustre ? > > Here has been pretty bad. :-( > > We''ve had to drop our attempts to experiment with it, the recommended RPM > kernel that''s available seems very fragile, with a number of NFS related > panics without any Lustre modules loaded on a system that used to be rock > solid and some unexplained crashes on other Lustre kernel nodes that seems > too coincidental.. > > My guess is that the heavy modifications of the kernel are creating unforseen > side effects that affects the stability of non-Lustre parts that may not be > being exercised properly in testing.I''m sorry to hear that you''re having trouble; although we use the Lustre kernels in very NFS-intensive environments, they are almost exclusively used as NFS clients, not NFS servers. You are correct that the NFS server in the patched kernel is not tested well enough, and I will make sure that our test suite grows to include it. In the short term, it is a simple matter to remove those NFS server patches while we resolve the issues; those patches were not written by CFS, in fact, but by including them we certainly take responsibility for them. If you have not given up entirely, I''m happy to upload a new set of RPMs without patches to the NFS server. The only fallout will be that you won''t be able to re-export a Lustre file system via NFS, but I suspect that you were not planning to do that anyways. If you have other reproducible crashes, I hope you will share them with us. We have not received many bug reports for Lustre 1.0.4.> I also have strong reservations about the moderated nature of the mailing > list, certainly my previous email regarding NFS panics hasn''t made it through > to the list. Not a good model..The mailing list is a service to the Lustre community, through which we attempt to provide free advice. While we want to encourage a vibrant Lustre user group, I also need to make sure that we remain viable. We have no expensive hardware or license fees to sell, only our time, and that means putting our paying customers first. That all being said, I can certainly understand if lustre-discuss is not meeting your technical support needs. It has been three business days since you wrote that email, and sometimes we only get around to cleaning house on lustre-discuss once per week. If you require faster turnaround, we have a support option which I think is priced very inexpensively. As I believe Evan will confirm, our customers are well cared for, and he knows precisely whose cage to rattle if something is not being addressed in a timely fashion.> I''m afraid we''re probably going to be steering clear of it now for some time, > once bitten twice shy.I can understand being cautious. But if you change your mind, we''d love to help resolve your issues, even if it takes slightly longer than you had hoped. Thanks-- -Phil
On Mon, 2004-04-05 at 18:08, Chris Samuel wrote:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Tue, 6 Apr 2004 04:47 am, Evan Felix wrote: > > > Chris we addressed this same issue with out setup here at PNNL > > Thanks for the reply Evan. How are you folks at PNNL finding Lustre ? >We have been pretty happy here. I personally worked very closely with early beta code and got to know lustre pretty well about a year ago. With the help of CFS & HP(our system vendor) we built a very stable version to put on our system as it went into production last july. Since that time we have had 4 major lustre filesystem issues, and all of them related to hardware failure on the servers.> Here has been pretty bad. :-( > > We''ve had to drop our attempts to experiment with it, the recommended > RPM > kernel that''s available seems very fragile, with a number of NFS > related > panics without any Lustre modules loaded on a system that used to be > rock > solid and some unexplained crashes on other Lustre kernel nodes that > seems > too coincidental.. >We have never run with any NFS lustre patches...> My guess is that the heavy modifications of the kernel are creating > unforseen > side effects that affects the stability of non-Lustre parts that may > not be > being exercised properly in testing. > > I also have strong reservations about the moderated nature of the > mailing > list, certainly my previous email regarding NFS panics hasn''t made it > through > to the list. Not a good model..I dont like it much either. But i dont post here much. The IRC channel seem to be much more responsive. and as phil stated in another e-mail, i can pretty much find someone watching the IRC channel 24 hours a day.> > I''m afraid we''re probably going to be steering clear of it now for > some time, > once bitten twice shy.We have been happy with lustre, recently i needed to move 8T off somewhere in a few days, and it took me about 3 hours to create/build/deploy a 13Terabyte Lustre filesystem with 6 3.5T IDE based storage bricks. I used the stock 1.0.4 rpm kernels from the web site. once up it has worked for 4 weeks now. I''m tearing it down today, but its worked very well. Evan
Hi all, A very quick question (as I''m logged in from home and I need to go cook!). If setting up failover MDS and active/active failover OST''s is it necessary to ensure that the device names match over nodes, i.e. that /dev/sdb1 on node1 is /dev/sdb1 on node2, or is it permissible to have /dev/sdc1 on node1 be the same partition as /dev/sdb1 on node2 ? cheers! Chris -- Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia