Joe Little
2008-Feb-10 04:16 UTC
[Lustre-discuss] multihomed clients ignoring lnet options
I have all of my servers and clients using eth1 for the tcp lustre lnet. All have modprobe.conf entries of: options lnet networks="tcp0(eth1)" and all report with "lctl list_nids" that they are using the IP address associated with that interface (a net 192.168.200.x address) However, when my client connects, it ignores the above and goes with eth0 for routing, even though the mds/mgs is on that network range: client dmesg: Lustre: 4756:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192 Lustre: Added LNI 192.168.200.100 at tcp [8/256] Lustre: Accept secure, port 988 Lustre: OBD class driver, info at clusterfs.com Lustre Version: 1.6.4.2 Build Version: 1.6.4.2-19691231190000-PRISTINE-.cache.build.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-55.0.9.EL_lustre.1.6.4.2smp Lustre: Lustre Client File System; info at clusterfs.com LustreError: 4799:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error -104 reading HELLO from 192.168.2.201 LustreError: 11b-b: Connection to 192.168.2.201 at tcp at host 192.168.2.201 on port 988 was reset: is it running a compatible version of Lustre and is 192.168.2.201 at tcp one of its NIDs? server dmesg: LustreError: 120-3: Refusing connection from 192.168.2.192 for 192.168.2.201 at tcp: No matching NI
Joe Little
2008-Feb-10 05:58 UTC
[Lustre-discuss] multihomed clients ignoring lnet options
never mind.. The problem was resolved by recreating again the MGS and the OST''s using the same parameters on the server. I was able to change the parameters and still have the servers working, but my guess is that those options are permanently etched into the filesystem. On Feb 9, 2008 8:16 PM, Joe Little <jmlittle at gmail.com> wrote:> I have all of my servers and clients using eth1 for the tcp lustre lnet. > > All have modprobe.conf entries of: > > options lnet networks="tcp0(eth1)" > > and all report with "lctl list_nids" that they are using the IP > address associated with that interface (a net 192.168.200.x address) > > However, when my client connects, it ignores the above and goes with > eth0 for routing, even though the mds/mgs is on that network range: > > client dmesg: > > Lustre: 4756:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192 > Lustre: Added LNI 192.168.200.100 at tcp [8/256] > Lustre: Accept secure, port 988 > Lustre: OBD class driver, info at clusterfs.com > Lustre Version: 1.6.4.2 > Build Version: > 1.6.4.2-19691231190000-PRISTINE-.cache.build.BUILD.lustre-kernel-2.6.9.lustre.linux-2.6.9-55.0.9.EL_lustre.1.6.4.2smp > Lustre: Lustre Client File System; info at clusterfs.com > LustreError: 4799:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error > -104 reading HELLO from 192.168.2.201 > LustreError: 11b-b: Connection to 192.168.2.201 at tcp at host > 192.168.2.201 on port 988 was reset: is it running a compatible > version of Lustre and is 192.168.2.201 at tcp one of its NIDs? > > server dmesg: > LustreError: 120-3: Refusing connection from 192.168.2.192 for > 192.168.2.201 at tcp: No matching NI >
Aaron Knister
2008-Feb-10 15:43 UTC
[Lustre-discuss] multihomed clients ignoring lnet options
I believe that''s correct. The nids of the various server components are stored on the filesystem itself. On Feb 10, 2008, at 12:58 AM, Joe Little wrote:> never mind.. The problem was resolved by recreating again the MGS and > the OST''s using the same parameters on the server. I was able to > change the parameters and still have the servers working, but my guess > is that those options are permanently etched into the filesystem. > > > On Feb 9, 2008 8:16 PM, Joe Little <jmlittle at gmail.com> wrote: >> I have all of my servers and clients using eth1 for the tcp lustre >> lnet. >> >> All have modprobe.conf entries of: >> >> options lnet networks="tcp0(eth1)" >> >> and all report with "lctl list_nids" that they are using the IP >> address associated with that interface (a net 192.168.200.x address) >> >> However, when my client connects, it ignores the above and goes with >> eth0 for routing, even though the mds/mgs is on that network range: >> >> client dmesg: >> >> Lustre: 4756:0:(module.c:382:init_libcfs_module()) maximum lustre >> stack 8192 >> Lustre: Added LNI 192.168.200.100 at tcp [8/256] >> Lustre: Accept secure, port 988 >> Lustre: OBD class driver, info at clusterfs.com >> Lustre Version: 1.6.4.2 >> Build Version: >> 1.6.4.2-19691231190000-PRISTINE-.cache.build.BUILD.lustre- >> kernel-2.6.9.lustre.linux-2.6.9-55.0.9.EL_lustre.1.6.4.2smp >> Lustre: Lustre Client File System; info at clusterfs.com >> LustreError: 4799:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error >> -104 reading HELLO from 192.168.2.201 >> LustreError: 11b-b: Connection to 192.168.2.201 at tcp at host >> 192.168.2.201 on port 988 was reset: is it running a compatible >> version of Lustre and is 192.168.2.201 at tcp one of its NIDs? >> >> server dmesg: >> LustreError: 120-3: Refusing connection from 192.168.2.192 for >> 192.168.2.201 at tcp: No matching NI >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussAaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 aaron at iges.org
Cliff White
2008-Feb-12 04:00 UTC
[Lustre-discuss] multihomed clients ignoring lnet options
Aaron Knister wrote:> I believe that''s correct. The nids of the various server components > are stored on the filesystem itself.Yes, and you can always see them with tunefs.lustre --print <device> cliffw> > On Feb 10, 2008, at 12:58 AM, Joe Little wrote: > >> never mind.. The problem was resolved by recreating again the MGS and >> the OST''s using the same parameters on the server. I was able to >> change the parameters and still have the servers working, but my guess >> is that those options are permanently etched into the filesystem. >> >> >> On Feb 9, 2008 8:16 PM, Joe Little <jmlittle at gmail.com> wrote: >>> I have all of my servers and clients using eth1 for the tcp lustre >>> lnet. >>> >>> All have modprobe.conf entries of: >>> >>> options lnet networks="tcp0(eth1)" >>> >>> and all report with "lctl list_nids" that they are using the IP >>> address associated with that interface (a net 192.168.200.x address) >>> >>> However, when my client connects, it ignores the above and goes with >>> eth0 for routing, even though the mds/mgs is on that network range: >>> >>> client dmesg: >>> >>> Lustre: 4756:0:(module.c:382:init_libcfs_module()) maximum lustre >>> stack 8192 >>> Lustre: Added LNI 192.168.200.100 at tcp [8/256] >>> Lustre: Accept secure, port 988 >>> Lustre: OBD class driver, info at clusterfs.com >>> Lustre Version: 1.6.4.2 >>> Build Version: >>> 1.6.4.2-19691231190000-PRISTINE-.cache.build.BUILD.lustre- >>> kernel-2.6.9.lustre.linux-2.6.9-55.0.9.EL_lustre.1.6.4.2smp >>> Lustre: Lustre Client File System; info at clusterfs.com >>> LustreError: 4799:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error >>> -104 reading HELLO from 192.168.2.201 >>> LustreError: 11b-b: Connection to 192.168.2.201 at tcp at host >>> 192.168.2.201 on port 988 was reset: is it running a compatible >>> version of Lustre and is 192.168.2.201 at tcp one of its NIDs? >>> >>> server dmesg: >>> LustreError: 120-3: Refusing connection from 192.168.2.192 for >>> 192.168.2.201 at tcp: No matching NI >>> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > Aaron Knister > Associate Systems Analyst > Center for Ocean-Land-Atmosphere Studies > > (301) 595-7000 > aaron at iges.org > > > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Joe Little
2008-Feb-12 04:51 UTC
[Lustre-discuss] multihomed clients ignoring lnet options
On Feb 11, 2008 8:00 PM, Cliff White <Cliff.White at sun.com> wrote:> Aaron Knister wrote: > > I believe that''s correct. The nids of the various server components > > are stored on the filesystem itself. > > Yes, and you can always see them with > tunefs.lustre --print <device> > > cliffwanyone to change them after the fact?> > > > > > On Feb 10, 2008, at 12:58 AM, Joe Little wrote: > > > >> never mind.. The problem was resolved by recreating again the MGS and > >> the OST''s using the same parameters on the server. I was able to > >> change the parameters and still have the servers working, but my guess > >> is that those options are permanently etched into the filesystem. > >> > >> > >> On Feb 9, 2008 8:16 PM, Joe Little <jmlittle at gmail.com> wrote: > >>> I have all of my servers and clients using eth1 for the tcp lustre > >>> lnet. > >>> > >>> All have modprobe.conf entries of: > >>> > >>> options lnet networks="tcp0(eth1)" > >>> > >>> and all report with "lctl list_nids" that they are using the IP > >>> address associated with that interface (a net 192.168.200.x address) > >>> > >>> However, when my client connects, it ignores the above and goes with > >>> eth0 for routing, even though the mds/mgs is on that network range: > >>> > >>> client dmesg: > >>> > >>> Lustre: 4756:0:(module.c:382:init_libcfs_module()) maximum lustre > >>> stack 8192 > >>> Lustre: Added LNI 192.168.200.100 at tcp [8/256] > >>> Lustre: Accept secure, port 988 > >>> Lustre: OBD class driver, info at clusterfs.com > >>> Lustre Version: 1.6.4.2 > >>> Build Version: > >>> 1.6.4.2-19691231190000-PRISTINE-.cache.build.BUILD.lustre- > >>> kernel-2.6.9.lustre.linux-2.6.9-55.0.9.EL_lustre.1.6.4.2smp > >>> Lustre: Lustre Client File System; info at clusterfs.com > >>> LustreError: 4799:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error > >>> -104 reading HELLO from 192.168.2.201 > >>> LustreError: 11b-b: Connection to 192.168.2.201 at tcp at host > >>> 192.168.2.201 on port 988 was reset: is it running a compatible > >>> version of Lustre and is 192.168.2.201 at tcp one of its NIDs? > >>> > >>> server dmesg: > >>> LustreError: 120-3: Refusing connection from 192.168.2.192 for > >>> 192.168.2.201 at tcp: No matching NI > >>> > >> _______________________________________________ > >> Lustre-discuss mailing list > >> Lustre-discuss at lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > Aaron Knister > > Associate Systems Analyst > > Center for Ocean-Land-Atmosphere Studies > > > > (301) 595-7000 > > aaron at iges.org > > > > > > > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >
Steden Klaus
2008-Feb-12 04:53 UTC
[Lustre-discuss] multihomed clients ignoring lnet options
If you have root, you can change them using tunefs.lustre after the file system has been shut down. I''ve done this a number of times to test various lnet configs. Klaus ----- Original Message ----- From: lustre-discuss-bounces at lists.lustre.org <lustre-discuss-bounces at lists.lustre.org> To: Cliff White <Cliff.White at sun.com> Cc: lustre-discuss at lists.lustre.org <lustre-discuss at lists.lustre.org> Sent: Mon Feb 11 20:51:20 2008 Subject: Re: [Lustre-discuss] multihomed clients ignoring lnet options On Feb 11, 2008 8:00 PM, Cliff White <Cliff.White at sun.com> wrote:> Aaron Knister wrote: > > I believe that''s correct. The nids of the various server components > > are stored on the filesystem itself. > > Yes, and you can always see them with > tunefs.lustre --print <device> > > cliffwanyone to change them after the fact?> > > > > > On Feb 10, 2008, at 12:58 AM, Joe Little wrote: > > > >> never mind.. The problem was resolved by recreating again the MGS and > >> the OST''s using the same parameters on the server. I was able to > >> change the parameters and still have the servers working, but my guess > >> is that those options are permanently etched into the filesystem. > >> > >> > >> On Feb 9, 2008 8:16 PM, Joe Little <jmlittle at gmail.com> wrote: > >>> I have all of my servers and clients using eth1 for the tcp lustre > >>> lnet. > >>> > >>> All have modprobe.conf entries of: > >>> > >>> options lnet networks="tcp0(eth1)" > >>> > >>> and all report with "lctl list_nids" that they are using the IP > >>> address associated with that interface (a net 192.168.200.x address) > >>> > >>> However, when my client connects, it ignores the above and goes with > >>> eth0 for routing, even though the mds/mgs is on that network range: > >>> > >>> client dmesg: > >>> > >>> Lustre: 4756:0:(module.c:382:init_libcfs_module()) maximum lustre > >>> stack 8192 > >>> Lustre: Added LNI 192.168.200.100 at tcp [8/256] > >>> Lustre: Accept secure, port 988 > >>> Lustre: OBD class driver, info at clusterfs.com > >>> Lustre Version: 1.6.4.2 > >>> Build Version: > >>> 1.6.4.2-19691231190000-PRISTINE-.cache.build.BUILD.lustre- > >>> kernel-2.6.9.lustre.linux-2.6.9-55.0.9.EL_lustre.1.6.4.2smp > >>> Lustre: Lustre Client File System; info at clusterfs.com > >>> LustreError: 4799:0:(socklnd_cb.c:2167:ksocknal_recv_hello()) Error > >>> -104 reading HELLO from 192.168.2.201 > >>> LustreError: 11b-b: Connection to 192.168.2.201 at tcp at host > >>> 192.168.2.201 on port 988 was reset: is it running a compatible > >>> version of Lustre and is 192.168.2.201 at tcp one of its NIDs? > >>> > >>> server dmesg: > >>> LustreError: 120-3: Refusing connection from 192.168.2.192 for > >>> 192.168.2.201 at tcp: No matching NI > >>> > >> _______________________________________________ > >> Lustre-discuss mailing list > >> Lustre-discuss at lists.lustre.org > >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > Aaron Knister > > Associate Systems Analyst > > Center for Ocean-Land-Atmosphere Studies > > > > (301) 595-7000 > > aaron at iges.org > > > > > > > > > > _______________________________________________ > > Lustre-discuss mailing list > > Lustre-discuss at lists.lustre.org > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >_______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss