John White
2011-Nov-02 15:55 UTC
[Lustre-discuss] problem with clients and multiple transports
Hello Folks, I''m having a problem pinning a given FS to a given transport type. I have 2 file systems hosted on the OSS/MDS side by an ib and tcp nid (each) being mounted by a client specifying the appropriate transport for each of the file systems (one over tcp and one over o2ib). I can not mount o2ib. Period. If my client''s modprobe.conf lists o2ib before the tcp nid, any o2ib mount request fails. If the modprobe.conf has tcp first, both mount requests will mount via tcp0, no matter how the mount request is crafted. So, am I just missing documentation somewhere that states you can only use a single transport on a client at a time or is this a bug? ---------------- John White HPC Systems Engineer (510) 486-7307 One Cyclotron Rd, MS: 50C-3396 University of California Berkeley Berkeley, CA 94720
Ben Evans
2011-Nov-02 16:01 UTC
[Lustre-discuss] problem with clients and multiple transports
My guess is you''ve got two "options lnet" lines in modprobe.conf It should look something like this: options lnet networks=tcp0(eth0),o2ib0(ib0) options ko2iblnd ipif_name=ib0 -----Original Message----- From: lustre-discuss-bounces at lists.lustre.org [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of John White Sent: Wednesday, November 02, 2011 11:55 AM To: lustre-discuss at lists.lustre.org Subject: [Lustre-discuss] problem with clients and multiple transports Hello Folks, I''m having a problem pinning a given FS to a given transport type. I have 2 file systems hosted on the OSS/MDS side by an ib and tcp nid (each) being mounted by a client specifying the appropriate transport for each of the file systems (one over tcp and one over o2ib). I can not mount o2ib. Period. If my client''s modprobe.conf lists o2ib before the tcp nid, any o2ib mount request fails. If the modprobe.conf has tcp first, both mount requests will mount via tcp0, no matter how the mount request is crafted. So, am I just missing documentation somewhere that states you can only use a single transport on a client at a time or is this a bug? ---------------- John White HPC Systems Engineer (510) 486-7307 One Cyclotron Rd, MS: 50C-3396 University of California Berkeley Berkeley, CA 94720 _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
John White
2011-Nov-02 16:05 UTC
[Lustre-discuss] problem with clients and multiple transports
I did have that, with the 2nd line commented out. I just uncommented the 2nd, unloaded, depmod''d and reloaded only to see the same behavior. ---------------- John White HPC Systems Engineer (510) 486-7307 One Cyclotron Rd, MS: 50C-3396 University of California Berkeley Berkeley, CA 94720 On Nov 2, 2011, at 9:01 AM, Ben Evans wrote:> My guess is you''ve got two "options lnet" lines in modprobe.conf > > It should look something like this: > > options lnet networks=tcp0(eth0),o2ib0(ib0) > options ko2iblnd ipif_name=ib0 > > > > -----Original Message----- > From: lustre-discuss-bounces at lists.lustre.org > [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of John White > Sent: Wednesday, November 02, 2011 11:55 AM > To: lustre-discuss at lists.lustre.org > Subject: [Lustre-discuss] problem with clients and multiple transports > > Hello Folks, > I''m having a problem pinning a given FS to a given transport > type. I have 2 file systems hosted on the OSS/MDS side by an ib and tcp > nid (each) being mounted by a client specifying the appropriate > transport for each of the file systems (one over tcp and one over o2ib). > I can not mount o2ib. Period. If my client''s modprobe.conf lists o2ib > before the tcp nid, any o2ib mount request fails. If the modprobe.conf > has tcp first, both mount requests will mount via tcp0, no matter how > the mount request is crafted. > > So, am I just missing documentation somewhere that states you > can only use a single transport on a client at a time or is this a bug? > ---------------- > John White > HPC Systems Engineer > (510) 486-7307 > One Cyclotron Rd, MS: 50C-3396 > University of California Berkeley > Berkeley, CA 94720 > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
David Dillow
2011-Nov-03 02:11 UTC
[Lustre-discuss] problem with clients and multiple transports
On Wed, 2011-11-02 at 08:55 -0700, John White wrote:> Hello Folks, > I''m having a problem pinning a given FS to a given transport type. I > have 2 file systems hosted on the OSS/MDS side by an ib and tcp nid > (each) being mounted by a client specifying the appropriate transport > for each of the file systems (one over tcp and one over o2ib). I can > not mount o2ib. Period. If my client''s modprobe.conf lists o2ib > before the tcp nid, any o2ib mount request fails. If the > modprobe.conf has tcp first, both mount requests will mount via tcp0, > no matter how the mount request is crafted.Were both tcp and o2ib LNETs present when you built the filesystem? If not, try performing a ''tunefs.lustre --writeconf ...'' on each of the servers with LNET started but the servers unmounted. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office
John White
2011-Nov-04 22:07 UTC
[Lustre-discuss] problem with clients and multiple transports
Yes, all OSTs and MDTs were formatted with a full compliment of transports. ---------------- John White HPC Systems Engineer (510) 486-7307 One Cyclotron Rd, MS: 50C-3396 Lawrence Berkeley National Lab Berkeley, CA 94720 On Nov 2, 2011, at 7:11 PM, David Dillow wrote:> On Wed, 2011-11-02 at 08:55 -0700, John White wrote: >> Hello Folks, >> I''m having a problem pinning a given FS to a given transport type. I >> have 2 file systems hosted on the OSS/MDS side by an ib and tcp nid >> (each) being mounted by a client specifying the appropriate transport >> for each of the file systems (one over tcp and one over o2ib). I can >> not mount o2ib. Period. If my client''s modprobe.conf lists o2ib >> before the tcp nid, any o2ib mount request fails. If the >> modprobe.conf has tcp first, both mount requests will mount via tcp0, >> no matter how the mount request is crafted. > > Were both tcp and o2ib LNETs present when you built the filesystem? If > not, try performing a ''tunefs.lustre --writeconf ...'' on each of the > servers with LNET started but the servers unmounted. > -- > Dave Dillow > National Center for Computational Science > Oak Ridge National Laboratory > (865) 241-6602 office >
Indivar Nair
2011-Nov-08 15:38 UTC
[Lustre-discuss] problem with clients and multiple transports
Hi John, If ib is not working with Lustre at all, then - Hope you have recompiled Lustre to use the use the OFED modules included Lustre Kernel. Your normal ib network might be working, but it wont work with Lustre unless you recompile it too. Lustre looks at it as a normal TCP/IP interface, therefore you are able to mount using tcp, but not infiniband. Check these links out - https://blogs.oracle.com/atulvid/entry/compiling_lustre_from_sources_with http://wiki.lustre.org/index.php/Configuring_InfiniBand_Connectivity http://pkg-ofed.alioth.debian.org/howto/infiniband-howto-9.html#ss9.4 Hope I am on the right track :-). Regards, Indivar Nair On Wed, Nov 2, 2011 at 9:35 PM, John White <jwhite at lbl.gov> wrote:> I did have that, with the 2nd line commented out. I just uncommented the > 2nd, unloaded, depmod''d and reloaded only to see the same behavior. > ---------------- > John White > HPC Systems Engineer > (510) 486-7307 > One Cyclotron Rd, MS: 50C-3396 > University of California Berkeley > Berkeley, CA 94720 > > On Nov 2, 2011, at 9:01 AM, Ben Evans wrote: > > > My guess is you''ve got two "options lnet" lines in modprobe.conf > > > > It should look something like this: > > > > options lnet networks=tcp0(eth0),o2ib0(ib0) > > options ko2iblnd ipif_name=ib0 > > > > > > > > -----Original Message----- > > From: lustre-discuss-bounces at lists.lustre.org > > [mailto:lustre-discuss-bounces at lists.lustre.org] On Behalf Of John White > > Sent: Wednesday, November 02, 2011 11:55 AM > > To: lustre-discuss at lists.lustre.org > > Subject: [Lustre-discuss] problem with clients and multiple transports > > > > Hello Folks, > > I''m having a problem pinning a given FS to a given transport > > type. I have 2 file systems hosted on the OSS/MDS side by an ib and tcp > > nid (each) being mounted by a client specifying the appropriate > > transport for each of the file systems (one over tcp and one over o2ib). > > I can not mount o2ib. Period. If my client''s modprobe.conf lists o2ib > > before the tcp nid, any o2ib mount request fails. If the modprobe.conf > > has tcp first, both mount requests will mount via tcp0, no matter how > > the mount request is crafted. > > > > So, am I just missing documentation somewhere that states you > > can only use a single transport on a client at a time or is this a bug? > > ---------------- > > John White > > HPC Systems Engineer > > (510) 486-7307 > > One Cyclotron Rd, MS: 50C-3396 > > University of California Berkeley > > Berkeley, CA 94720 > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20111108/1670f9ec/attachment.html
Thomas Roth
2011-Nov-09 17:13 UTC
[Lustre-discuss] problem with clients and multiple transports
Your clients have both ib and tcp nids? Because I encountered a strange behavior trying to mount an ib based FS and a tcp based FS on the same (ethernet-only) client. To connect to the ib MDS it had to go through a lnet router, of course. Experimentally, I found > options lnet networks=tcp0(eth0:0),tcp1(eth0) routes="o2ib 10.81.2.108 at tcp1; tcp 10.12.0.1 at tcp1" to be working. This client sits with its eth0 interface in the 10.12.x.y subnet, 10.81.2.108 is the IP of the lnet router. I have no idea why a second IP on that client should be necessary, but without it, no connection could be established. Also, the order ''networks=tcp0(eth0:0),tcp1(eth0)'' proved to be important, ''tcp1(eth0),tcp0(eth0:0)'' didn''t work either. But I''m afraid this is a unrelated problem (the two Lustres are separate FS, different MGSes..) Thomas On 02.11.2011 16:55, John White wrote:> Hello Folks, > I''m having a problem pinning a given FS to a given transport type. I have 2 file systems hosted on the OSS/MDS side by an ib and tcp nid (each) being mounted by a client specifying the appropriate transport for each of the file systems (one over tcp and one over o2ib). I can not mount o2ib. Period. If my client''s modprobe.conf lists o2ib before the tcp nid, any o2ib mount request fails. If the modprobe.conf has tcp first, both mount requests will mount via tcp0, no matter how the mount request is crafted. > > So, am I just missing documentation somewhere that states you can only use a single transport on a client at a time or is this a bug? > ---------------- > John White > HPC Systems Engineer > (510) 486-7307 > One Cyclotron Rd, MS: 50C-3396 > University of California Berkeley > Berkeley, CA 94720 > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
John White
2011-Nov-22 15:45 UTC
[Lustre-discuss] problem with clients and multiple transports
Unfortunately, no lnet routers are involved. If I just have o2ib NIDs available on the client, the IB connected FS mounts fine. If I have TCP NIDs listed first in the modprobe.conf along with o2ib, IB does not mount. ---------------- John White HPC Systems Engineer (510) 486-7307 One Cyclotron Rd, MS: 50C-3396 Lawrence Berkeley National Lab Berkeley, CA 94720 On Nov 9, 2011, at 9:13 AM, Thomas Roth wrote:> Your clients have both ib and tcp nids? Because I encountered a strange behavior trying to mount an ib > based FS and a tcp based FS on the same (ethernet-only) client. To connect to the ib MDS it had to go > through a lnet router, of course. > Experimentally, I found >> options lnet networks=tcp0(eth0:0),tcp1(eth0) routes="o2ib 10.81.2.108 at tcp1; tcp 10.12.0.1 at tcp1" > to be working. > This client sits with its eth0 interface in the 10.12.x.y subnet, 10.81.2.108 is the IP of the lnet > router. I have no idea why a second IP on that client should be necessary, but without it, no > connection could be established. > Also, the order ''networks=tcp0(eth0:0),tcp1(eth0)'' proved to be important, ''tcp1(eth0),tcp0(eth0:0)'' > didn''t work either. > But I''m afraid this is a unrelated problem (the two Lustres are separate FS, different MGSes..) > > Thomas > > On 02.11.2011 16:55, John White wrote: >> Hello Folks, >> I''m having a problem pinning a given FS to a given transport type. I have 2 file systems hosted on the OSS/MDS side by an ib and tcp nid (each) being mounted by a client specifying the appropriate transport for each of the file systems (one over tcp and one over o2ib). I can not mount o2ib. Period. If my client''s modprobe.conf lists o2ib before the tcp nid, any o2ib mount request fails. If the modprobe.conf has tcp first, both mount requests will mount via tcp0, no matter how the mount request is crafted. >> >> So, am I just missing documentation somewhere that states you can only use a single transport on a client at a time or is this a bug? >> ---------------- >> John White >> HPC Systems Engineer >> (510) 486-7307 >> One Cyclotron Rd, MS: 50C-3396 >> University of California Berkeley >> Berkeley, CA 94720 >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
John White
2011-Nov-30 21:55 UTC
[Lustre-discuss] problem with clients and multiple transports
This same issue is occuring on a freshly created fs using the commands below: mkfs.lustre --mgs --mdt --fsname=lr1 --verbose /dev/dm-1 format-luns.sh --ost --mgsnode=10.4.200.9 at o2ib,10.0.200.9 at tcp --failnode=10.4.200.10 at o2ib,10.0.200.10 at tcp --failnode=10.4.200.11 at o2ib,10.0.200.11 at tcp --verbose --fsname=lr1 format-luns.sh is just a script that takes a peak at heartbeat configs and mkfs''s luns accordingly using the specified options. I''m really at a complete loss here. Even specifying "mount -t lustre 10.4.200.9 at o2ib:/lr1 /mnt" will result in a mount that uses tcp NIDs for any RPCs. ---------------- John White HPC Systems Engineer (510) 486-7307 One Cyclotron Rd, MS: 50C-3396 Lawrence Berkeley National Lab Berkeley, CA 94720 On Nov 22, 2011, at 7:45 AM, John White wrote:> Unfortunately, no lnet routers are involved. If I just have o2ib NIDs available on the client, the IB connected FS mounts fine. If I have TCP NIDs listed first in the modprobe.conf along with o2ib, IB does not mount. > ---------------- > John White > HPC Systems Engineer > (510) 486-7307 > One Cyclotron Rd, MS: 50C-3396 > Lawrence Berkeley National Lab > Berkeley, CA 94720 > > On Nov 9, 2011, at 9:13 AM, Thomas Roth wrote: > >> Your clients have both ib and tcp nids? Because I encountered a strange behavior trying to mount an ib >> based FS and a tcp based FS on the same (ethernet-only) client. To connect to the ib MDS it had to go >> through a lnet router, of course. >> Experimentally, I found >>> options lnet networks=tcp0(eth0:0),tcp1(eth0) routes="o2ib 10.81.2.108 at tcp1; tcp 10.12.0.1 at tcp1" >> to be working. >> This client sits with its eth0 interface in the 10.12.x.y subnet, 10.81.2.108 is the IP of the lnet >> router. I have no idea why a second IP on that client should be necessary, but without it, no >> connection could be established. >> Also, the order ''networks=tcp0(eth0:0),tcp1(eth0)'' proved to be important, ''tcp1(eth0),tcp0(eth0:0)'' >> didn''t work either. >> But I''m afraid this is a unrelated problem (the two Lustres are separate FS, different MGSes..) >> >> Thomas >> >> On 02.11.2011 16:55, John White wrote: >>> Hello Folks, >>> I''m having a problem pinning a given FS to a given transport type. I have 2 file systems hosted on the OSS/MDS side by an ib and tcp nid (each) being mounted by a client specifying the appropriate transport for each of the file systems (one over tcp and one over o2ib). I can not mount o2ib. Period. If my client''s modprobe.conf lists o2ib before the tcp nid, any o2ib mount request fails. If the modprobe.conf has tcp first, both mount requests will mount via tcp0, no matter how the mount request is crafted. >>> >>> So, am I just missing documentation somewhere that states you can only use a single transport on a client at a time or is this a bug? >>> ---------------- >>> John White >>> HPC Systems Engineer >>> (510) 486-7307 >>> One Cyclotron Rd, MS: 50C-3396 >>> University of California Berkeley >>> Berkeley, CA 94720 >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >