John White
2009-Nov-13 22:34 UTC
[Lustre-discuss] question about failnode with mixed networks
Hello Folks, We have a lustre instance served out both o2ib and tcp. In a failover situation, it would appear that tcp connected clients do not get the hint to switch over to the secondary MDS (I would assume the same is true for the OSS as well). When I initially set up the file system, I specified --failnode for the @o2ib interfaces, should I have also specified NIDs for the @tcp0 during the fs construction? If so, is it possible to add this as an afterthought? ---------------- John White High Performance Computing Services (HPCS) (510) 486-7307 One Cyclotron Rd, MS: 50B-3209C Lawrence Berkeley National Lab Berkeley, CA 94720
Andreas Dilger
2009-Nov-15 18:52 UTC
[Lustre-discuss] question about failnode with mixed networks
On 2009-11-13, at 14:34, John White wrote:> We have a lustre instance served out both o2ib and tcp. In a > failover situation, it would appear that tcp connected clients do > not get the hint to switch over to the secondary MDS (I would assume > the same is true for the OSS as well). When I initially set up the > file system, I specified --failnode for the @o2ib interfaces, should > I have also specified NIDs for the @tcp0 during the fs > construction? If so, is it possible to add this as an afterthought?It is always safe to re-run --writeconf on all of the nodes to re-do the filesystem configuration. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Brian J. Murrell
2009-Nov-16 14:51 UTC
[Lustre-discuss] question about failnode with mixed networks
On Fri, 2009-11-13 at 14:34 -0800, John White wrote:> > In a failover situation, it would appear that tcp connected clients do not get the hint to switch over to the secondary MDSClients don''t (yet) get "hints" to switch servers. Clients continue to use a server until they don''t get a response, at which time they cycle through their list of NIDs for the unresponsive service.> When I initially set up the file system, I specified --failnode for the @o2ib interfaces,Only the @o2ib interfaces?> should I have also specified NIDs for the @tcp0 during the fs construction?Yes. You specify the NIDs for all servers that should be considered for that service.> If so, is it possible to add this as an afterthought?You want tunefs.lustre. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20091116/d65ccb52/attachment.bin
John White
2009-Nov-24 21:19 UTC
[Lustre-discuss] question about failnode with mixed networks
Excellent, thanks for the replies. One more question: Is there a --failnode corollary for MGTs...? Does lustre support MGT/S failover? On Nov 16, 2009, at 6:51 AM, Brian J. Murrell wrote:> On Fri, 2009-11-13 at 14:34 -0800, John White wrote: >> >> In a failover situation, it would appear that tcp connected clients do not get the hint to switch over to the secondary MDS > > Clients don''t (yet) get "hints" to switch servers. Clients continue to > use a server until they don''t get a response, at which time they cycle > through their list of NIDs for the unresponsive service. > >> When I initially set up the file system, I specified --failnode for the @o2ib interfaces, > > Only the @o2ib interfaces? > >> should I have also specified NIDs for the @tcp0 during the fs construction? > > Yes. You specify the NIDs for all servers that should be considered for > that service. > >> If so, is it possible to add this as an afterthought? > > You want tunefs.lustre. > > b. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss---------------- John White High Performance Computing Services (HPCS) (510) 486-7307 One Cyclotron Rd, MS: 50B-3209C Lawrence Berkeley National Lab Berkeley, CA 94720
Jeffrey Bennett
2009-Nov-24 21:54 UTC
[Lustre-discuss] question about failnode with mixed networks
Hi John, Yes, you can use multiple MGS, but you have to tell the OSTs, in this way (example): mkfs.lustre --fsname testfs --ost --mgsnode=mds0 at tcp0 --mgsnode=mds1 at tcp0 /dev/sda Whenever you mount the filesystem, mount it this way: mount -t lustre mds0 at tcp0:mds1 at tcp0:/testfs /mnt/testfs Jeffrey A. Bennett HPC Systems Engineer San Diego Supercomputer Center http://users.sdsc.edu/~jab -----Original Message----- From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of John White Sent: Tuesday, November 24, 2009 1:20 PM To: Brian J. Murrell Cc: lustre-discuss at lists.lustre.org Subject: Re: [Lustre-discuss] question about failnode with mixed networks Excellent, thanks for the replies. One more question: Is there a --failnode corollary for MGTs...? Does lustre support MGT/S failover? On Nov 16, 2009, at 6:51 AM, Brian J. Murrell wrote:> On Fri, 2009-11-13 at 14:34 -0800, John White wrote: >> >> In a failover situation, it would appear that tcp connected clients do not get the hint to switch over to the secondary MDS > > Clients don''t (yet) get "hints" to switch servers. Clients continue to > use a server until they don''t get a response, at which time they cycle > through their list of NIDs for the unresponsive service. > >> When I initially set up the file system, I specified --failnode for the @o2ib interfaces, > > Only the @o2ib interfaces? > >> should I have also specified NIDs for the @tcp0 during the fs construction? > > Yes. You specify the NIDs for all servers that should be considered for > that service. > >> If so, is it possible to add this as an afterthought? > > You want tunefs.lustre. > > b. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss---------------- John White High Performance Computing Services (HPCS) (510) 486-7307 One Cyclotron Rd, MS: 50B-3209C Lawrence Berkeley National Lab Berkeley, CA 94720 _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
John White
2009-Nov-24 23:06 UTC
[Lustre-discuss] question about failnode with mixed networks
Okay, thanks Jeff. This opens up another question... can you fail between protocols? I have clients that have both o2ib and tcp connectivity to the servers, can I do: mount -t lustre mds0 at o2ib:mds1 at o2ib:mds0 at tcp0:mds1 at tcp0:/testfs /mnt/testfs Will state be preserved between protocols or is this just entirely insane? On Nov 24, 2009, at 1:54 PM, Jeffrey Bennett wrote:> Hi John, > > Yes, you can use multiple MGS, but you have to tell the OSTs, in this way (example): > > mkfs.lustre --fsname testfs --ost --mgsnode=mds0 at tcp0 --mgsnode=mds1 at tcp0 /dev/sda > > Whenever you mount the filesystem, mount it this way: > > mount -t lustre mds0 at tcp0:mds1 at tcp0:/testfs /mnt/testfs > > Jeffrey A. Bennett > HPC Systems Engineer > San Diego Supercomputer Center > http://users.sdsc.edu/~jab > > -----Original Message----- > From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of John White > Sent: Tuesday, November 24, 2009 1:20 PM > To: Brian J. Murrell > Cc: lustre-discuss at lists.lustre.org > Subject: Re: [Lustre-discuss] question about failnode with mixed networks > > Excellent, thanks for the replies. One more question: > Is there a --failnode corollary for MGTs...? Does lustre support MGT/S failover? > > On Nov 16, 2009, at 6:51 AM, Brian J. Murrell wrote: > >> On Fri, 2009-11-13 at 14:34 -0800, John White wrote: >>> >>> In a failover situation, it would appear that tcp connected clients do not get the hint to switch over to the secondary MDS >> >> Clients don''t (yet) get "hints" to switch servers. Clients continue to >> use a server until they don''t get a response, at which time they cycle >> through their list of NIDs for the unresponsive service. >> >>> When I initially set up the file system, I specified --failnode for the @o2ib interfaces, >> >> Only the @o2ib interfaces? >> >>> should I have also specified NIDs for the @tcp0 during the fs construction? >> >> Yes. You specify the NIDs for all servers that should be considered for >> that service. >> >>> If so, is it possible to add this as an afterthought? >> >> You want tunefs.lustre. >> >> b. >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > ---------------- > John White > High Performance Computing Services (HPCS) > (510) 486-7307 > One Cyclotron Rd, MS: 50B-3209C > Lawrence Berkeley National Lab > Berkeley, CA 94720 > > > > > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss---------------- John White High Performance Computing Services (HPCS) (510) 486-7307 One Cyclotron Rd, MS: 50B-3209C Lawrence Berkeley National Lab Berkeley, CA 94720
Andreas Dilger
2009-Nov-25 02:46 UTC
[Lustre-discuss] question about failnode with mixed networks
On 2009-11-24, at 16:06, John White wrote:> Okay, thanks Jeff. This opens up another question... can you fail > between protocols? > I have clients that have both o2ib and tcp connectivity to the > servers, can I do: > > mount -t lustre mds0 at o2ib:mds1 at o2ib:mds0 at tcp0:mds1 at tcp0:/testfs /mnt/ > testfs > > Will state be preserved between protocols or is this just entirely > insane?Lustre can do this, but it isn''t a normal config. Note that in the case of multiple interfaces for the same node there is a slightly different syntax for the mount... I _believe_ (though don''t have the info handy right now) that you separate NIDs for the same node with commas, and different nodes with a colon, so you can try: mount -t lustre mds0 at o2ib,mds0 at tcp:mds1 at o2ib0,mds1 at tcp0:/test /mnt/test I''m not 100% sure of that. Note that this may double your client''s failover times because it now has 4 addresses to try when reconnecting, instead of 2.> On Nov 24, 2009, at 1:54 PM, Jeffrey Bennett wrote: > >> Hi John, >> >> Yes, you can use multiple MGS, but you have to tell the OSTs, in >> this way (example): >> >> mkfs.lustre --fsname testfs --ost --mgsnode=mds0 at tcp0 -- >> mgsnode=mds1 at tcp0 /dev/sda >> >> Whenever you mount the filesystem, mount it this way: >> >> mount -t lustre mds0 at tcp0:mds1 at tcp0:/testfs /mnt/testfs >> >> Jeffrey A. Bennett >> HPC Systems Engineer >> San Diego Supercomputer Center >> http://users.sdsc.edu/~jab >> >> -----Original Message----- >> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org >> ] On Behalf Of John White >> Sent: Tuesday, November 24, 2009 1:20 PM >> To: Brian J. Murrell >> Cc: lustre-discuss at lists.lustre.org >> Subject: Re: [Lustre-discuss] question about failnode with mixed >> networks >> >> Excellent, thanks for the replies. One more question: >> Is there a --failnode corollary for MGTs...? Does lustre support >> MGT/S failover? >> >> On Nov 16, 2009, at 6:51 AM, Brian J. Murrell wrote: >> >>> On Fri, 2009-11-13 at 14:34 -0800, John White wrote: >>>> >>>> In a failover situation, it would appear that tcp connected >>>> clients do not get the hint to switch over to the secondary MDS >>> >>> Clients don''t (yet) get "hints" to switch servers. Clients >>> continue to >>> use a server until they don''t get a response, at which time they >>> cycle >>> through their list of NIDs for the unresponsive service. >>> >>>> When I initially set up the file system, I specified --failnode >>>> for the @o2ib interfaces, >>> >>> Only the @o2ib interfaces? >>> >>>> should I have also specified NIDs for the @tcp0 during the fs >>>> construction? >>> >>> Yes. You specify the NIDs for all servers that should be >>> considered for >>> that service. >>> >>>> If so, is it possible to add this as an afterthought? >>> >>> You want tunefs.lustre. >>> >>> b. >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> ---------------- >> John White >> High Performance Computing Services (HPCS) >> (510) 486-7307 >> One Cyclotron Rd, MS: 50B-3209C >> Lawrence Berkeley National Lab >> Berkeley, CA 94720 >> >> >> >> >> >> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > ---------------- > John White > High Performance Computing Services (HPCS) > (510) 486-7307 > One Cyclotron Rd, MS: 50B-3209C > Lawrence Berkeley National Lab > Berkeley, CA 94720 > > > > > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussCheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Kevin Van Maren
2009-Nov-27 16:48 UTC
[Lustre-discuss] question about failnode with mixed networks
The Lustre networking model is that only a single connection will be used between a client and server, as Lustre picks the "best" of the available options and does not fall back to other options. See https://bugzilla.lustre.org/show_bug.cgi?id=19854 Note that the client only registers one NID with the server, so while the client can reconnect (with some patches and games played with the NIDs), the server cannot reconnect to the client through a different path, so the client could still be evicted if the server needs to eg, revoke a lock. Kevin Andreas Dilger wrote:> On 2009-11-24, at 16:06, John White wrote: > >> Okay, thanks Jeff. This opens up another question... can you fail >> between protocols? >> I have clients that have both o2ib and tcp connectivity to the >> servers, can I do: >> >> mount -t lustre mds0 at o2ib:mds1 at o2ib:mds0 at tcp0:mds1 at tcp0:/testfs /mnt/ >> testfs >> >> Will state be preserved between protocols or is this just entirely >> insane? >> > > Lustre can do this, but it isn''t a normal config. Note that in the > case of multiple interfaces for the same node there is a slightly > different syntax for the mount... I _believe_ (though don''t have the > info handy right now) that you separate NIDs for the same node with > commas, and different nodes with a colon, so you can try: > > mount -t lustre mds0 at o2ib,mds0 at tcp:mds1 at o2ib0,mds1 at tcp0:/test /mnt/test > > I''m not 100% sure of that. > > Note that this may double your client''s failover times because it now > has 4 addresses to try when reconnecting, instead of 2. > > >> On Nov 24, 2009, at 1:54 PM, Jeffrey Bennett wrote: >> >> >>> Hi John, >>> >>> Yes, you can use multiple MGS, but you have to tell the OSTs, in >>> this way (example): >>> >>> mkfs.lustre --fsname testfs --ost --mgsnode=mds0 at tcp0 -- >>> mgsnode=mds1 at tcp0 /dev/sda >>> >>> Whenever you mount the filesystem, mount it this way: >>> >>> mount -t lustre mds0 at tcp0:mds1 at tcp0:/testfs /mnt/testfs >>> >>> Jeffrey A. Bennett >>> HPC Systems Engineer >>> San Diego Supercomputer Center >>> http://users.sdsc.edu/~jab >>> >>> -----Original Message----- >>> From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org >>> ] On Behalf Of John White >>> Sent: Tuesday, November 24, 2009 1:20 PM >>> To: Brian J. Murrell >>> Cc: lustre-discuss at lists.lustre.org >>> Subject: Re: [Lustre-discuss] question about failnode with mixed >>> networks >>> >>> Excellent, thanks for the replies. One more question: >>> Is there a --failnode corollary for MGTs...? Does lustre support >>> MGT/S failover? >>> >>> On Nov 16, 2009, at 6:51 AM, Brian J. Murrell wrote: >>> >>> >>>> On Fri, 2009-11-13 at 14:34 -0800, John White wrote: >>>> >>>>> In a failover situation, it would appear that tcp connected >>>>> clients do not get the hint to switch over to the secondary MDS >>>>> >>>> Clients don''t (yet) get "hints" to switch servers. Clients >>>> continue to >>>> use a server until they don''t get a response, at which time they >>>> cycle >>>> through their list of NIDs for the unresponsive service. >>>> >>>> >>>>> When I initially set up the file system, I specified --failnode >>>>> for the @o2ib interfaces, >>>>> >>>> Only the @o2ib interfaces? >>>> >>>> >>>>> should I have also specified NIDs for the @tcp0 during the fs >>>>> construction? >>>>> >>>> Yes. You specify the NIDs for all servers that should be >>>> considered for >>>> that service. >>>> >>>> >>>>> If so, is it possible to add this as an afterthought? >>>>> >>>> You want tunefs.lustre. >>>> >>>> b. >>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>> ---------------- >>> John White >>> High Performance Computing Services (HPCS) >>> (510) 486-7307 >>> One Cyclotron Rd, MS: 50B-3209C >>> Lawrence Berkeley National Lab >>> Berkeley, CA 94720 >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >> ---------------- >> John White >> High Performance Computing Services (HPCS) >> (510) 486-7307 >> One Cyclotron Rd, MS: 50B-3209C >> Lawrence Berkeley National Lab >> Berkeley, CA 94720 >> >> >> >> >> >> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >