Hi, I''m trying to configure Lustre 1.8.1 with a Voltaire infiniband network. On the MGS, MDS and OSSs I have two interfaces: eth1 and ib0. I''ve successfully completed a test using eth1 so I''ve mounted a filesystem on client node. Now I want to do the same thing with Voltaire infiniband (ib0) modifying the modprobe.conf on both servers an clients with the line: options lnet networks=tcp(ib0) When I try to mount the FS on the client node nothing happen and I find the following error in the syslog: Oct 5 13:08:12 xc264 kernel: Lustre: 5468:0:(linux-tcpip.c:688:libcfs_sock_connect()) Error -101 connecting 0.0.0.0/1023 -> 172.31.1.25/988 Oct 5 13:08:12 xc264 kernel: Lustre: 5468:0:(acceptor.c:95:lnet_connect_console_error()) Connection to 172.31.1.25 at tcp at host 172.31.1.25 was unreachable: the network or that node may be down, or Lustre may be misconfigured. Oct 5 13:08:12 xc264 kernel: Lustre: 5468:0:(socklnd_cb.c:421:ksocknal_txlist_done()) Deleting packet type 1 len 368 172.31.65.24 at tcp->172.31.1.25 at tcp Oct 5 13:08:17 xc264 kernel: Lustre: 5474:0:(client.c:1383:ptlrpc_expire_one_request()) @@@ Request x1315690795499541 sent from lustre-MDT0000-mdc-ffff81021f97e400 to NID 172.31.1.25 at tcp 5s ago ha s timed out (limit 5s). Oct 5 13:08:17 xc264 kernel: req at ffff81021828fc00 x1315690795499541/t0 o38->lustre-MDT0000_UUID at 172.31.1.25@tcp:12/10 lens 368/584 e 0 to 1 dl 1254740897 ref 1 fl Rpc:N/0/0 rc 0/0 The main problem is displayed on the first line. The MGS ib0 address is 172.31.65.25 but as you can see the client always try to connect the eth1 address (172.31.1.25) even shutting down eth1. Placing in modprobe.conf a line with vib(ib0) the problem is different I''ve also tried to modify modprobe.conf by changing the options line to: options lnet networks=vib(ib0) but in the syslog I''ve found: Oct 5 12:33:19 xc264 kernel: LustreError: 4864:0:(api-ni.c:1043:lnet_startup_lndnis()) Can''t load LND vib, module kviblnd Are there other configurations for IB that I forgot to make/modify? Thanks in advance for your help
On 10/5/09 7:58 AM, "Aielli Roberto" <r.aielli at cineca.it> wrote:> Hi, > I''m trying to configure Lustre 1.8.1 with a Voltaire infiniband network. > On the MGS, MDS and OSSs I have two interfaces: eth1 and ib0. I''ve > successfully completed a test using eth1 so I''ve mounted a filesystem > on client node. Now I want to do the same thing with Voltaire infiniband > (ib0) modifying the modprobe.conf on both servers an clients with the line: > > options lnet networks=tcp(ib0)This would use IPoIB, not native infiniband. Is that what you want? To do native infiniband, asssuming you are using OFED, you need to specify: options lnet networks=o2ib(ib0) Did you unload/reload the Lustre modules after changing modprobe.conf.local? If not, it would not recognize the changes in modprobe.conf.local.> > > When I try to mount the FS on the client node nothing happen and I find > the following error in the syslog: > > Oct 5 13:08:12 xc264 kernel: Lustre: > 5468:0:(linux-tcpip.c:688:libcfs_sock_connect()) Error -101 connecting > 0.0.0.0/1023 -> 172.31.1.25/988 > > Oct 5 13:08:12 xc264 kernel: Lustre: > 5468:0:(acceptor.c:95:lnet_connect_console_error()) Connection to > 172.31.1.25 at tcp at host 172.31.1.25 was unreachable: the network or that node > may be down, or > > Lustre may be misconfigured. > > Oct 5 13:08:12 xc264 kernel: Lustre: > 5468:0:(socklnd_cb.c:421:ksocknal_txlist_done()) Deleting packet type 1 len > 368 172.31.65.24 at tcp->172.31.1.25 at tcp > > Oct 5 13:08:17 xc264 kernel: Lustre: > 5474:0:(client.c:1383:ptlrpc_expire_one_request()) @@@ Request > x1315690795499541 sent from lustre-MDT0000-mdc-ffff81021f97e400 to NID > 172.31.1.25 at tcp 5s ago ha > > s timed out (limit 5s). > > Oct 5 13:08:17 xc264 kernel: req at ffff81021828fc00 x1315690795499541/t0 > o38->lustre-MDT0000_UUID at 172.31.1.25@tcp:12/10 lens 368/584 e 0 to 1 dl > 1254740897 ref 1 fl Rpc:N/0/0 rc 0/0 > > > The main problem is displayed on the first line. The MGS ib0 address is > 172.31.65.25 but as you can see the client always try to connect the > eth1 address (172.31.1.25) even shutting down eth1. > > Placing in modprobe.conf a line with vib(ib0) the problem is different > I''ve also tried to modify modprobe.conf by changing the options line to: > > options lnet networks=vib(ib0) > > > but in the syslog I''ve found: > > Oct 5 12:33:19 xc264 kernel: LustreError: > 4864:0:(api-ni.c:1043:lnet_startup_lndnis()) Can''t load LND vib, module > kviblnd > > > Are there other configurations for IB that I forgot to make/modify? > > Thanks in advance for your help > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Hopefully this is a silly question for you. When you changed your lustre set-up to use the ib0 in place of the eth1, did you do the updated writeconf command on the MDS/MDT and the OSS/OST to have them point to the new address? I''m kind of assuming you have but I thought I would ask the question. Cheers! megan Message: 2 Date: Mon, 5 Oct 2009 14:58:41 +0200 (MEST) From: Aielli Roberto <r.aielli at cineca.it> Subject: [Lustre-discuss] Lustre voltaire configuration To: lustre-discuss at lists.lustre.org Message-ID: <4AC9ED80.3040001 at cineca.it> Content-Type: text/plain; charset=ISO-8859-1 Hi, I''m trying to configure Lustre 1.8.1 with a Voltaire infiniband network. On the MGS, MDS and OSSs I have two interfaces: eth1 and ib0. I''ve successfully completed a test using eth1 so I''ve mounted a filesystem on client node. Now I want to do the same thing with Voltaire infiniband (ib0) modifying the modprobe.conf on both servers an clients with the line: options lnet networks=tcp(ib0) When I try to mount the FS on the client node nothing happen and I find the following error in the syslog: Oct 5 13:08:12 xc264 kernel: Lustre: 5468:0:(linux-tcpip.c:688:libcfs_sock_connect()) Error -101 connecting 0.0.0.0/1023 -> 172.31.1.25/988 Oct 5 13:08:12 xc264 kernel: Lustre: 5468:0:(acceptor.c:95:lnet_connect_console_error()) Connection to 172.31.1.25 at tcp at host 172.31.1.25 was unreachable: the network or that node may be down, or Lustre may be misconfigured. Oct 5 13:08:12 xc264 kernel: Lustre: 5468:0:(socklnd_cb.c:421:ksocknal_txlist_done()) Deleting packet type 1 len 368 172.31.65.24 at tcp->172.31.1.25 at tcp Oct 5 13:08:17 xc264 kernel: Lustre: 5474:0:(client.c:1383:ptlrpc_expire_one_request()) @@@ Request x1315690795499541 sent from lustre-MDT0000-mdc-ffff81021f97e400 to NID 172.31.1.25 at tcp 5s ago ha s timed out (limit 5s). Oct 5 13:08:17 xc264 kernel: req at ffff81021828fc00 x1315690795499541/t0 o38->lustre-MDT0000_UUID at 172.31.1.25@tcp:12/10 lens 368/584 e 0 to 1 dl 1254740897 ref 1 fl Rpc:N/0/0 rc 0/0 The main problem is displayed on the first line. The MGS ib0 address is 172.31.65.25 but as you can see the client always try to connect the eth1 address (172.31.1.25) even shutting down eth1. Placing in modprobe.conf a line with vib(ib0) the problem is different I''ve also tried to modify modprobe.conf by changing the options line to: options lnet networks=vib(ib0) but in the syslog I''ve found: Oct 5 12:33:19 xc264 kernel: LustreError: 4864:0:(api-ni.c:1043:lnet_startup_lndnis()) Can''t load LND vib, module kviblnd Are there other configurations for IB that I forgot to make/modify? Thanks in advance for your help ------------------------------ _______________________________________________ Lustre-discuss mailing list Lustre-discuss at lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss End of Lustre-discuss Digest, Vol 45, Issue 5
Megan, After the writeconf and a few other configurations I''ve managed to make it work with IPoIB. Thanks, Roberto Ms. Megan Larko wrote:> Hopefully this is a silly question for you. > > When you changed your lustre set-up to use the ib0 in place of the > eth1, did you do the updated writeconf command on the MDS/MDT and the > OSS/OST to have them point to the new address? > > I''m kind of assuming you have but I thought I would ask the question. > > Cheers! > megan > >
Hi Dennis, I want to test both configurations: IPoIB and after RDMA. I''ve reloaded the modules after the modification, but probably some more configurations needs to be done. Thanks, Roberto Dennis Nelson wrote:> On 10/5/09 7:58 AM, "Aielli Roberto" <r.aielli at cineca.it> wrote: > > >> Hi, >> I''m trying to configure Lustre 1.8.1 with a Voltaire infiniband network. >> On the MGS, MDS and OSSs I have two interfaces: eth1 and ib0. I''ve >> successfully completed a test using eth1 so I''ve mounted a filesystem >> on client node. Now I want to do the same thing with Voltaire infiniband >> (ib0) modifying the modprobe.conf on both servers an clients with the line: >> >> options lnet networks=tcp(ib0) >> > > This would use IPoIB, not native infiniband. Is that what you want? To do > native infiniband, asssuming you are using OFED, you need to specify: > > options lnet networks=o2ib(ib0) > > Did you unload/reload the Lustre modules after changing modprobe.conf.local? > If not, it would not recognize the changes in modprobe.conf.local. > > >> When I try to mount the FS on the client node nothing happen and I find >> the following error in the syslog: >> >> Oct 5 13:08:12 xc264 kernel: Lustre: >> 5468:0:(linux-tcpip.c:688:libcfs_sock_connect()) Error -101 connecting >> 0.0.0.0/1023 -> 172.31.1.25/988 >> >> Oct 5 13:08:12 xc264 kernel: Lustre: >> 5468:0:(acceptor.c:95:lnet_connect_console_error()) Connection to >> 172.31.1.25 at tcp at host 172.31.1.25 was unreachable: the network or that node >> may be down, or >> >> Lustre may be misconfigured. >> >> Oct 5 13:08:12 xc264 kernel: Lustre: >> 5468:0:(socklnd_cb.c:421:ksocknal_txlist_done()) Deleting packet type 1 len >> 368 172.31.65.24 at tcp->172.31.1.25 at tcp >> >> Oct 5 13:08:17 xc264 kernel: Lustre: >> 5474:0:(client.c:1383:ptlrpc_expire_one_request()) @@@ Request >> x1315690795499541 sent from lustre-MDT0000-mdc-ffff81021f97e400 to NID >> 172.31.1.25 at tcp 5s ago ha >> >> s timed out (limit 5s). >> >> Oct 5 13:08:17 xc264 kernel: req at ffff81021828fc00 x1315690795499541/t0 >> o38->lustre-MDT0000_UUID at 172.31.1.25@tcp:12/10 lens 368/584 e 0 to 1 dl >> 1254740897 ref 1 fl Rpc:N/0/0 rc 0/0 >> >> >> The main problem is displayed on the first line. The MGS ib0 address is >> 172.31.65.25 but as you can see the client always try to connect the >> eth1 address (172.31.1.25) even shutting down eth1. >> >> Placing in modprobe.conf a line with vib(ib0) the problem is different >> I''ve also tried to modify modprobe.conf by changing the options line to: >> >> options lnet networks=vib(ib0) >> >> >> but in the syslog I''ve found: >> >> Oct 5 12:33:19 xc264 kernel: LustreError: >> 4864:0:(api-ni.c:1043:lnet_startup_lndnis()) Can''t load LND vib, module >> kviblnd >> >> >> Are there other configurations for IB that I forgot to make/modify? >> >> Thanks in advance for your help >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > > >