Hi all, I''m getting my feet wet in the infiniband lake and of course I run into some problems. It would seem I got the compilation part of sles11 kernel 2.6.27 + Lustre 1.8.3 + ofed 1.4.2 right, because it allows me to see and use the infiniband fabric, and because ko2iblnd loads without any complaints. In /etc/modprobe.d/lustre (this is a Debian system, hence this subdir of modprobe-configs), I have> options ip2nets="o2ib0 192.168.0.[1-5]"I load lnet and do ''lctl network up'', but then ''lctl list_nids'' will invariably give me only> 192.168.0.1 at tcpno matter how I twist the modprobe-config (ip2nets="o2ib", network="o2ib", network="o2ib(ib0), etc.) This is true as long as I have ib0 configured with the IP 192.168.0.1 Once I unconfigure it, I get, quite expectedly, LNET configure error 100: Network is down So I can either configure ipoib and bring up the network, but using tcp, or I don''t configure ib0 and then cannot start the network -? ;-{} I think I''m rather missing something here. Any clues? Cheers, Thomas
Hey Thomas, Are you trying to connect to Lustre via IB and ethernet? If so your modprobe config should look like this. options lnet networks="o2ib0(ib0),tcp0(eth0)" If you''re IB only use. options lnet networks="o2ib0(ib0)" If your MDS and OSS servers are on a separate networks you''ll need to do something different. Let''s say the MDS and OSSs are on o2ib0/tcp0 and the clients are on o2ib1/tcp1. You''ll need a router server with separate addresses on o2ib0 and o2ib1. Also its important to note that o2ib0 and o2ib1 should be different IP address spaces. On the clients. # I live on o2ib1 options lnet networks="o2ib1(ib0),tcp1(eth0)" # To get to o2ib0 go through IP.ADD.OF.ROUTER at oi2ib1 options lnet routes="o2ib0 IP.ADD.OF.ROUTER at o2ib1" On the servers # I live on o2ib0 options lnet networks="o2ib0(ib0),tcp0(eth0)" # To get to o2ib1 go through IP.ADD.OF.ROUTER at oi2ib0 options lnet routes="o2ib1 IP.ADD.OF.ROUTER at o2ib0" IP.ADD.OF.ROUTER at oi2ib0 and IP.ADD.OF.ROUTER at oi2ib1 are different IPs on distinct networks. lctl list_nids will show you the lustre nids of the node you''re logged into only. lctl route_list will show you the lustre routers and the networks that they bridge. I hope this was helpful. Erik On Tue, Jun 22, 2010 at 10:19 AM, Thomas Roth <t.roth at gsi.de> wrote:> Hi all, > > I''m getting my feet wet in the infiniband lake and of course I run into > some problems. > It would seem I got the compilation part of sles11 kernel 2.6.27 + > Lustre 1.8.3 + ofed 1.4.2 right, because it allows me to see and use the > infiniband fabric, and because ko2iblnd loads without any complaints. > > In /etc/modprobe.d/lustre (this is a Debian system, hence this subdir of > modprobe-configs), I have >> options ip2nets="o2ib0 192.168.0.[1-5]" > I load lnet and do ''lctl network up'', but then ''lctl list_nids'' will > invariably give me only >> 192.168.0.1 at tcp > no matter how I twist the modprobe-config (ip2nets="o2ib", > network="o2ib", network="o2ib(ib0), etc.) > > This is true as long as I have ib0 configured with the IP 192.168.0.1 > Once I unconfigure it, I get, quite expectedly, > LNET configure error 100: Network is down > > So I can either configure ipoib and bring up the network, but using tcp, > or I don''t configure ib0 and then cannot start the network -? ;-{} ?I > think I''m rather missing something here. > Any clues? > > Cheers, > Thomas > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss >
Hello Erik, thanks for your advice, esp. on routing - I''ll study that carefully once I get that far. For now, I was just trying the minimal first steps to get lnet via IB: - It''s all happening on the MGS/MDS, but neither mgs nor mdt yet mounted, just ''modprobe lnet; lctl network up; lctl list_nids'' - I tried to use IB exclusively. - options lnet networks="o2ib0(ib0)" doesn''t work either (nor variations thereof) Regards, Thomas On 22.06.2010 18:40, Erik Froese wrote:> Hey Thomas, > > Are you trying to connect to Lustre via IB and ethernet? If so your > modprobe config should look like this. > options lnet networks="o2ib0(ib0),tcp0(eth0)" > > If you''re IB only use. > options lnet networks="o2ib0(ib0)" > > If your MDS and OSS servers are on a separate networks you''ll need to > do something different. > Let''s say the MDS and OSSs are on o2ib0/tcp0 and the clients are on > o2ib1/tcp1. You''ll need a router server with separate addresses on > o2ib0 and o2ib1. > > Also its important to note that o2ib0 and o2ib1 should be different IP > address spaces. > > On the clients. > # I live on o2ib1 > options lnet networks="o2ib1(ib0),tcp1(eth0)" > # To get to o2ib0 go through IP.ADD.OF.ROUTER at oi2ib1 > options lnet routes="o2ib0 IP.ADD.OF.ROUTER at o2ib1" > > On the servers > # I live on o2ib0 > options lnet networks="o2ib0(ib0),tcp0(eth0)" > # To get to o2ib1 go through IP.ADD.OF.ROUTER at oi2ib0 > options lnet routes="o2ib1 IP.ADD.OF.ROUTER at o2ib0" > > IP.ADD.OF.ROUTER at oi2ib0 and IP.ADD.OF.ROUTER at oi2ib1 are different IPs > on distinct networks. > > lctl list_nids will show you the lustre nids of the node you''re logged > into only. > lctl route_list will show you the lustre routers and the networks that > they bridge. > > I hope this was helpful. > > Erik > > On Tue, Jun 22, 2010 at 10:19 AM, Thomas Roth <t.roth at gsi.de> wrote: >> Hi all, >> >> I''m getting my feet wet in the infiniband lake and of course I run into >> some problems. >> It would seem I got the compilation part of sles11 kernel 2.6.27 + >> Lustre 1.8.3 + ofed 1.4.2 right, because it allows me to see and use the >> infiniband fabric, and because ko2iblnd loads without any complaints. >> >> In /etc/modprobe.d/lustre (this is a Debian system, hence this subdir of >> modprobe-configs), I have >>> options ip2nets="o2ib0 192.168.0.[1-5]" >> I load lnet and do ''lctl network up'', but then ''lctl list_nids'' will >> invariably give me only >>> 192.168.0.1 at tcp >> no matter how I twist the modprobe-config (ip2nets="o2ib", >> network="o2ib", network="o2ib(ib0), etc.) >> >> This is true as long as I have ib0 configured with the IP 192.168.0.1 >> Once I unconfigure it, I get, quite expectedly, >> LNET configure error 100: Network is down >> >> So I can either configure ipoib and bring up the network, but using tcp, >> or I don''t configure ib0 and then cannot start the network -? ;-{} I >> think I''m rather missing something here. >> Any clues? >> >> Cheers, >> Thomas >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >-- -------------------------------------------------------------------- Thomas Roth Department: Informationstechnologie Location: SB3 1.262 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum f?r Schwerionenforschung GmbH Planckstra?e 1 64291 Darmstadt www.gsi.de Gesellschaft mit beschr?nkter Haftung Sitz der Gesellschaft: Darmstadt Handelsregister: Amtsgericht Darmstadt, HRB 1528 Gesch?ftsf?hrung: Professor Dr. Dr. h.c. Horst St?cker, Christiane Neumann, Dr. Hartmut Eickhoff Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
Thomas, If you see a ib0 device and it has a valid IP lnet should pick it up with options lnet networks="o2ib0(ib0)" What errors are you seeing? Erik On Tue, Jun 22, 2010 at 1:14 PM, Thomas Roth <t.roth at gsi.de> wrote:> Hello Erik, > > thanks for your advice, esp. on routing - I''ll study that carefully once > I get that far. > For now, I was just trying the minimal first steps to get lnet via IB: > - It''s all happening on the MGS/MDS, but neither mgs nor mdt yet > mounted, just ''modprobe lnet; lctl network up; lctl list_nids'' > - I tried to use IB exclusively. > - options lnet networks="o2ib0(ib0)" ?doesn''t work either (nor > variations thereof) > > Regards, > Thomas > > On 22.06.2010 18:40, Erik Froese wrote: >> Hey Thomas, >> >> Are you trying to connect to Lustre via IB and ethernet? If so your >> modprobe config should look like this. >> options lnet networks="o2ib0(ib0),tcp0(eth0)" >> >> If you''re IB only use. >> options lnet networks="o2ib0(ib0)" >> >> If your MDS and OSS servers are on a separate networks you''ll need to >> do something different. >> Let''s say the MDS and OSSs are on o2ib0/tcp0 and the clients are on >> o2ib1/tcp1. You''ll need a router server with separate addresses on >> o2ib0 and o2ib1. >> >> Also its important to note that o2ib0 and o2ib1 should be different IP >> address spaces. >> >> On the clients. >> # I live on o2ib1 >> options lnet networks="o2ib1(ib0),tcp1(eth0)" >> # To get to o2ib0 go through IP.ADD.OF.ROUTER at oi2ib1 >> options lnet routes="o2ib0 IP.ADD.OF.ROUTER at o2ib1" >> >> On the servers >> # I live on o2ib0 >> options lnet networks="o2ib0(ib0),tcp0(eth0)" >> # To get to o2ib1 go through IP.ADD.OF.ROUTER at oi2ib0 >> options lnet routes="o2ib1 IP.ADD.OF.ROUTER at o2ib0" >> >> IP.ADD.OF.ROUTER at oi2ib0 and IP.ADD.OF.ROUTER at oi2ib1 are different IPs >> on distinct networks. >> >> lctl list_nids will show you the lustre nids of the node you''re logged >> into only. >> lctl route_list will show you the lustre routers and the networks that >> they bridge. >> >> I hope this was helpful. >> >> Erik >> >> On Tue, Jun 22, 2010 at 10:19 AM, Thomas Roth <t.roth at gsi.de> wrote: >>> Hi all, >>> >>> I''m getting my feet wet in the infiniband lake and of course I run into >>> some problems. >>> It would seem I got the compilation part of sles11 kernel 2.6.27 + >>> Lustre 1.8.3 + ofed 1.4.2 right, because it allows me to see and use the >>> infiniband fabric, and because ko2iblnd loads without any complaints. >>> >>> In /etc/modprobe.d/lustre (this is a Debian system, hence this subdir of >>> modprobe-configs), I have >>>> options ip2nets="o2ib0 192.168.0.[1-5]" >>> I load lnet and do ''lctl network up'', but then ''lctl list_nids'' will >>> invariably give me only >>>> 192.168.0.1 at tcp >>> no matter how I twist the modprobe-config (ip2nets="o2ib", >>> network="o2ib", network="o2ib(ib0), etc.) >>> >>> This is true as long as I have ib0 configured with the IP 192.168.0.1 >>> Once I unconfigure it, I get, quite expectedly, >>> LNET configure error 100: Network is down >>> >>> So I can either configure ipoib and bring up the network, but using tcp, >>> or I don''t configure ib0 and then cannot start the network -? ;-{} ?I >>> think I''m rather missing something here. >>> Any clues? >>> >>> Cheers, >>> Thomas >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >> > > -- > -------------------------------------------------------------------- > Thomas Roth > Department: Informationstechnologie > Location: SB3 1.262 > Phone: +49-6159-71 1453 ?Fax: +49-6159-71 2986 > > GSI Helmholtzzentrum f?r Schwerionenforschung GmbH > Planckstra?e 1 > 64291 Darmstadt > www.gsi.de > > Gesellschaft mit beschr?nkter Haftung > Sitz der Gesellschaft: Darmstadt > Handelsregister: Amtsgericht Darmstadt, HRB 1528 > > Gesch?ftsf?hrung: Professor Dr. Dr. h.c. Horst St?cker, > Christiane Neumann, Dr. Hartmut Eickhoff > > Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph > Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt > >
Hi Thomas, Here''s a one thing to check, (if you''re trying to replace a tcp network with an IB one, on an existing lustre filesystem): With the lustre mounts unmounted, run: tunefs.lustre --dryrun <DEV_PATH> | grep Parameters check to ensure that parameters like ''mgsnode=IP'' end in @o2ib and not @tcp. If they do, erase and rewrite them. Cheers, Adam Erik Froese wrote:> Thomas, > > If you see a ib0 device and it has a valid IP lnet should pick it up with > options lnet networks="o2ib0(ib0)" > > What errors are you seeing? > > Erik > > On Tue, Jun 22, 2010 at 1:14 PM, Thomas Roth <t.roth at gsi.de> wrote: > >> Hello Erik, >> >> thanks for your advice, esp. on routing - I''ll study that carefully once >> I get that far. >> For now, I was just trying the minimal first steps to get lnet via IB: >> - It''s all happening on the MGS/MDS, but neither mgs nor mdt yet >> mounted, just ''modprobe lnet; lctl network up; lctl list_nids'' >> - I tried to use IB exclusively. >> - options lnet networks="o2ib0(ib0)" doesn''t work either (nor >> variations thereof) >> >> Regards, >> Thomas >> >> On 22.06.2010 18:40, Erik Froese wrote: >> >>> Hey Thomas, >>> >>> Are you trying to connect to Lustre via IB and ethernet? If so your >>> modprobe config should look like this. >>> options lnet networks="o2ib0(ib0),tcp0(eth0)" >>> >>> If you''re IB only use. >>> options lnet networks="o2ib0(ib0)" >>> >>> If your MDS and OSS servers are on a separate networks you''ll need to >>> do something different. >>> Let''s say the MDS and OSSs are on o2ib0/tcp0 and the clients are on >>> o2ib1/tcp1. You''ll need a router server with separate addresses on >>> o2ib0 and o2ib1. >>> >>> Also its important to note that o2ib0 and o2ib1 should be different IP >>> address spaces. >>> >>> On the clients. >>> # I live on o2ib1 >>> options lnet networks="o2ib1(ib0),tcp1(eth0)" >>> # To get to o2ib0 go through IP.ADD.OF.ROUTER at oi2ib1 >>> options lnet routes="o2ib0 IP.ADD.OF.ROUTER at o2ib1" >>> >>> On the servers >>> # I live on o2ib0 >>> options lnet networks="o2ib0(ib0),tcp0(eth0)" >>> # To get to o2ib1 go through IP.ADD.OF.ROUTER at oi2ib0 >>> options lnet routes="o2ib1 IP.ADD.OF.ROUTER at o2ib0" >>> >>> IP.ADD.OF.ROUTER at oi2ib0 and IP.ADD.OF.ROUTER at oi2ib1 are different IPs >>> on distinct networks. >>> >>> lctl list_nids will show you the lustre nids of the node you''re logged >>> into only. >>> lctl route_list will show you the lustre routers and the networks that >>> they bridge. >>> >>> I hope this was helpful. >>> >>> Erik >>> >>> On Tue, Jun 22, 2010 at 10:19 AM, Thomas Roth <t.roth at gsi.de> wrote: >>> >>>> Hi all, >>>> >>>> I''m getting my feet wet in the infiniband lake and of course I run into >>>> some problems. >>>> It would seem I got the compilation part of sles11 kernel 2.6.27 + >>>> Lustre 1.8.3 + ofed 1.4.2 right, because it allows me to see and use the >>>> infiniband fabric, and because ko2iblnd loads without any complaints. >>>> >>>> In /etc/modprobe.d/lustre (this is a Debian system, hence this subdir of >>>> modprobe-configs), I have >>>> >>>>> options ip2nets="o2ib0 192.168.0.[1-5]" >>>>> >>>> I load lnet and do ''lctl network up'', but then ''lctl list_nids'' will >>>> invariably give me only >>>> >>>>> 192.168.0.1 at tcp >>>>> >>>> no matter how I twist the modprobe-config (ip2nets="o2ib", >>>> network="o2ib", network="o2ib(ib0), etc.) >>>> >>>> This is true as long as I have ib0 configured with the IP 192.168.0.1 >>>> Once I unconfigure it, I get, quite expectedly, >>>> LNET configure error 100: Network is down >>>> >>>> So I can either configure ipoib and bring up the network, but using tcp, >>>> or I don''t configure ib0 and then cannot start the network -? ;-{} I >>>> think I''m rather missing something here. >>>> Any clues? >>>> >>>> Cheers, >>>> Thomas >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>>> >> -- >> -------------------------------------------------------------------- >> Thomas Roth >> Department: Informationstechnologie >> Location: SB3 1.262 >> Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 >> >> GSI Helmholtzzentrum f?r Schwerionenforschung GmbH >> Planckstra?e 1 >> 64291 Darmstadt >> www.gsi.de >> >> Gesellschaft mit beschr?nkter Haftung >> Sitz der Gesellschaft: Darmstadt >> Handelsregister: Amtsgericht Darmstadt, HRB 1528 >> >> Gesch?ftsf?hrung: Professor Dr. Dr. h.c. Horst St?cker, >> Christiane Neumann, Dr. Hartmut Eickhoff >> >> Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph >> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt >> >> >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >-- Adam Munro System Administrator | SHARCNET | http://www.sharcnet.ca Compute Canada | http://www.computecanada.org 519-888-4567 x36453
Hi Adam, Erik, I have varied that already: "--mgsnode=IB" or "--mgsnode=IB --failnode=tcp" etc. in the config of the MDT. But I don''t go as far as mounting either MGS or MDT. I''m just loading lnet and then use ''lctl'' to start the network: "lctl network up" "lctl list_nids" Whatever I put in modprobe.conf, I get the answer "192.168.0.1 at tcp" Regards, Thomas On 23.06.2010 20:55, Adam wrote:> Hi Thomas, > > Here''s a one thing to check, (if you''re trying to replace a tcp network > with an IB one, on an existing lustre filesystem): > > With the lustre mounts unmounted, run: > tunefs.lustre --dryrun <DEV_PATH> | grep Parameters > > check to ensure that parameters like ''mgsnode=IP'' end in @o2ib and not > @tcp. If they do, erase and rewrite them. > > Cheers, > Adam > > Erik Froese wrote: >> Thomas, >> >> If you see a ib0 device and it has a valid IP lnet should pick it up with >> options lnet networks="o2ib0(ib0)" >> >> What errors are you seeing? >> >> Erik >> >> On Tue, Jun 22, 2010 at 1:14 PM, Thomas Roth <t.roth at gsi.de> wrote: >> >>> Hello Erik, >>> >>> thanks for your advice, esp. on routing - I''ll study that carefully once >>> I get that far. >>> For now, I was just trying the minimal first steps to get lnet via IB: >>> - It''s all happening on the MGS/MDS, but neither mgs nor mdt yet >>> mounted, just ''modprobe lnet; lctl network up; lctl list_nids'' >>> - I tried to use IB exclusively. >>> - options lnet networks="o2ib0(ib0)" doesn''t work either (nor >>> variations thereof) >>> >>> Regards, >>> Thomas >>> >>> On 22.06.2010 18:40, Erik Froese wrote: >>> >>>> Hey Thomas, >>>> >>>> Are you trying to connect to Lustre via IB and ethernet? If so your >>>> modprobe config should look like this. >>>> options lnet networks="o2ib0(ib0),tcp0(eth0)" >>>> >>>> If you''re IB only use. >>>> options lnet networks="o2ib0(ib0)" >>>> >>>> If your MDS and OSS servers are on a separate networks you''ll need to >>>> do something different. >>>> Let''s say the MDS and OSSs are on o2ib0/tcp0 and the clients are on >>>> o2ib1/tcp1. You''ll need a router server with separate addresses on >>>> o2ib0 and o2ib1. >>>> >>>> Also its important to note that o2ib0 and o2ib1 should be different IP >>>> address spaces. >>>> >>>> On the clients. >>>> # I live on o2ib1 >>>> options lnet networks="o2ib1(ib0),tcp1(eth0)" >>>> # To get to o2ib0 go through IP.ADD.OF.ROUTER at oi2ib1 >>>> options lnet routes="o2ib0 IP.ADD.OF.ROUTER at o2ib1" >>>> >>>> On the servers >>>> # I live on o2ib0 >>>> options lnet networks="o2ib0(ib0),tcp0(eth0)" >>>> # To get to o2ib1 go through IP.ADD.OF.ROUTER at oi2ib0 >>>> options lnet routes="o2ib1 IP.ADD.OF.ROUTER at o2ib0" >>>> >>>> IP.ADD.OF.ROUTER at oi2ib0 and IP.ADD.OF.ROUTER at oi2ib1 are different IPs >>>> on distinct networks. >>>> >>>> lctl list_nids will show you the lustre nids of the node you''re logged >>>> into only. >>>> lctl route_list will show you the lustre routers and the networks that >>>> they bridge. >>>> >>>> I hope this was helpful. >>>> >>>> Erik >>>> >>>> On Tue, Jun 22, 2010 at 10:19 AM, Thomas Roth <t.roth at gsi.de> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I''m getting my feet wet in the infiniband lake and of course I run >>>>> into >>>>> some problems. >>>>> It would seem I got the compilation part of sles11 kernel 2.6.27 + >>>>> Lustre 1.8.3 + ofed 1.4.2 right, because it allows me to see and >>>>> use the >>>>> infiniband fabric, and because ko2iblnd loads without any complaints. >>>>> >>>>> In /etc/modprobe.d/lustre (this is a Debian system, hence this >>>>> subdir of >>>>> modprobe-configs), I have >>>>> >>>>>> options ip2nets="o2ib0 192.168.0.[1-5]" >>>>>> >>>>> I load lnet and do ''lctl network up'', but then ''lctl list_nids'' will >>>>> invariably give me only >>>>> >>>>>> 192.168.0.1 at tcp >>>>>> >>>>> no matter how I twist the modprobe-config (ip2nets="o2ib", >>>>> network="o2ib", network="o2ib(ib0), etc.) >>>>> >>>>> This is true as long as I have ib0 configured with the IP 192.168.0.1 >>>>> Once I unconfigure it, I get, quite expectedly, >>>>> LNET configure error 100: Network is down >>>>> >>>>> So I can either configure ipoib and bring up the network, but using >>>>> tcp, >>>>> or I don''t configure ib0 and then cannot start the network -? ;-{} I >>>>> think I''m rather missing something here. >>>>> Any clues? >>>>> >>>>> Cheers, >>>>> Thomas >>>>> _______________________________________________ >>>>> Lustre-discuss mailing list >>>>> Lustre-discuss at lists.lustre.org >>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>>> >>>>> >>> -- >>> -------------------------------------------------------------------- >>> Thomas Roth >>> Department: Informationstechnologie >>> Location: SB3 1.262 >>> Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 >>> >>> GSI Helmholtzzentrum f?r Schwerionenforschung GmbH >>> Planckstra?e 1 >>> 64291 Darmstadt >>> www.gsi.de >>> >>> Gesellschaft mit beschr?nkter Haftung >>> Sitz der Gesellschaft: Darmstadt >>> Handelsregister: Amtsgericht Darmstadt, HRB 1528 >>> >>> Gesch?ftsf?hrung: Professor Dr. Dr. h.c. Horst St?cker, >>> Christiane Neumann, Dr. Hartmut Eickhoff >>> >>> Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph >>> Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt >>> >>> >>> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > >-- -------------------------------------------------------------------- Thomas Roth Department: Informationstechnologie Location: SB3 1.262 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum f?r Schwerionenforschung GmbH Planckstra?e 1 64291 Darmstadt www.gsi.de Gesellschaft mit beschr?nkter Haftung Sitz der Gesellschaft: Darmstadt Handelsregister: Amtsgericht Darmstadt, HRB 1528 Gesch?ftsf?hrung: Professor Dr. Dr. h.c. Horst St?cker, Christiane Neumann, Dr. Hartmut Eickhoff Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
Hi! On Tue, Jun 22, 2010 at 04:19:08PM +0200, Thomas Roth wrote:> I''m getting my feet wet in the infiniband lake and of course I run into > some problems. > It would seem I got the compilation part of sles11 kernel 2.6.27 + > Lustre 1.8.3 + ofed 1.4.2 right, because it allows me to see and use the > infiniband fabric, and because ko2iblnd loads without any complaints. > > In /etc/modprobe.d/lustre (this is a Debian system, hence this subdir of > modprobe-configs), I have > > options ip2nets="o2ib0 192.168.0.[1-5]"If this is a verbatim copy from the config file, then you''re lacking the name of the module, ie. ''options lnet ip2nets=...''. Maybe also double-check with ''modprobe -c'' that options get passed on as intended.> I load lnet and do ''lctl network up'', but then ''lctl list_nids'' will > invariably give me only > > 192.168.0.1 at tcp > no matter how I twist the modprobe-config (ip2nets="o2ib", > network="o2ib", network="o2ib(ib0), etc.) > > This is true as long as I have ib0 configured with the IP 192.168.0.1 > Once I unconfigure it, I get, quite expectedly, > LNET configure error 100: Network is downSo ib0 is the only network interface in the system? In this case, I could imagine that ksocklnd gets loaded unconditionally, always grabs the first interface it can get hold of, and just doesn''t leave any IB interface for ko2iblnd when it eventually gets loaded. This is just a shot in the dark, but you could check by manually loading modules via insmod. Regards, Daniel.
Hi all, I did get my infiniband lnet up and working - using the modprobe line>> options lnet networks=o2ib0(ib0) routes="tcp1 192.168.0.3 at o2ib0"The only thing I did was to throw away and write again the lustre - modprobe.d file with this line, several times. Finally it worked. Cheers, Thomas On 22.06.2010 16:19, Thomas Roth wrote:> Hi all, > > I''m getting my feet wet in the infiniband lake and of course I run into > some problems. > It would seem I got the compilation part of sles11 kernel 2.6.27 + > Lustre 1.8.3 + ofed 1.4.2 right, because it allows me to see and use the > infiniband fabric, and because ko2iblnd loads without any complaints. > > In /etc/modprobe.d/lustre (this is a Debian system, hence this subdir of > modprobe-configs), I have >> options ip2nets="o2ib0 192.168.0.[1-5]" > I load lnet and do ''lctl network up'', but then ''lctl list_nids'' will > invariably give me only >> 192.168.0.1 at tcp > no matter how I twist the modprobe-config (ip2nets="o2ib", > network="o2ib", network="o2ib(ib0), etc.) > > This is true as long as I have ib0 configured with the IP 192.168.0.1 > Once I unconfigure it, I get, quite expectedly, > LNET configure error 100: Network is down > > So I can either configure ipoib and bring up the network, but using tcp, > or I don''t configure ib0 and then cannot start the network -? ;-{} I > think I''m rather missing something here. > Any clues? > > Cheers, > Thomas > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-- -------------------------------------------------------------------- Thomas Roth Department: Informationstechnologie Location: SB3 1.262 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum f?r Schwerionenforschung GmbH Planckstra?e 1 64291 Darmstadt www.gsi.de Gesellschaft mit beschr?nkter Haftung Sitz der Gesellschaft: Darmstadt Handelsregister: Amtsgericht Darmstadt, HRB 1528 Gesch?ftsf?hrung: Professor Dr. Dr. h.c. Horst St?cker, Christiane Neumann, Dr. Hartmut Eickhoff Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt