Erik Froese
2009-Jun-03 21:45 UTC
[Lustre-discuss] Configuring Lustre routring between two tcp networks
I''m trying to configure a lustre router so I can mount a test lustre FS over our standard network here (NYU federated ethernet). We have a small rocks cluster with one MDS/MGS and 3 OSSs on a private switch. Its a pretty standard rocks configuration. The cluster network is 10.1.255.0/24. == OSS / Router =One of the OSS (oss-0-2) is configured as follows: eth0 - 10.1.255.247 eth1 - 128.122.x.y In its /etc/modprobe.conf I have the following options lnet forwarding="enabled" options lnet accept=all options lnet networks="tcp0(eth0),tcp1(eth1)" [root at oss-0-2 ~]# lctl list_nids 10.1.255.247 at tcp 128.122.x.y at tcp1 == Routed Client =Then I have another client on the 128.122.x.* network. Let''s call it 128.122.x.z It just has eth0 configured as 128.122.x.z and in its modprobe.conf options lnet networks=tcp0(eth0) routes="tcp1 128.122.x.y at tcp0" Now should I be able to mount the lustre fs as such? mount.lustre 10.1.255.252 at tcp0:/scratch /scratch mount.lustre: mount 10.1.255.252 at tcp:/scratch at /scratch failed: Cannot send after transport endpoint shutdown I don''t see it sending any traffic to the router with tcpdump running on the router. What am I doing wrong? Should I be useing the 128.122 address of the router to try to mount? Am I missing a configuration somewhere? Thanks Erik Froese NYU -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090603/46d2c09d/attachment.html
Andreas Dilger
2009-Jun-04 03:56 UTC
[Lustre-discuss] Configuring Lustre routring between two tcp networks
On Jun 03, 2009 17:45 -0400, Erik Froese wrote:> I''m trying to configure a lustre router so I can mount a test lustre FS over > our standard network here (NYU federated ethernet). > We have a small rocks cluster with one MDS/MGS and 3 OSSs on a private > switch. Its a pretty standard rocks configuration. > The cluster network is 10.1.255.0/24. > > == OSS / Router => One of the OSS (oss-0-2) is configured as follows: > eth0 - 10.1.255.247 > eth1 - 128.122.x.y > > In its /etc/modprobe.conf I have the following > options lnet forwarding="enabled" > options lnet accept=all > options lnet networks="tcp0(eth0),tcp1(eth1)" > > [root at oss-0-2 ~]# lctl list_nids > 10.1.255.247 at tcp > 128.122.x.y at tcp1I''m not a routing expert, but I think I can see what is wrong here. To reiterate - your private network is tcp0 (10.x), and your external network is tcp1 (128.x).> == Routed Client => Then I have another client on the 128.122.x.* network. Let''s call it > 128.122.x.z It just has eth0 configured as 128.122.x.z > > and in its modprobe.conf > options lnet networks=tcp0(eth0) routes="tcp1 128.122.x.y at tcp0"Here you are configuring your external client to use tcp0 as 128.x, which does NOT match what you have configured on your router. You need to have (I think): options lnet networks=tcp1(eth0) routes="tcp0 128.122.x.y at tcp1"> Now should I be able to mount the lustre fs as such? > mount.lustre 10.1.255.252 at tcp0:/scratch /scratch > mount.lustre: mount 10.1.255.252 at tcp:/scratch at /scratch failed: Cannot > send after transport endpoint shutdownThis should be fine, once your routing is working. You could try "lctl ping 128.122.x.z at tcp1" to verify you can communicate with your router/OSS and "lctl ping 10.1.255.252 at tcp0" for the MDS behind it. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.
Isaac Huang
2009-Jun-04 13:53 UTC
[Lustre-discuss] Configuring Lustre routring between two tcp networks
On Wed, Jun 03, 2009 at 05:45:10PM -0400, Erik Froese wrote:> ...... > I don''t see it sending any traffic to the router with tcpdump running > on the router.Alternatively, you may run ''routerstat 1'' on the router to see how much data is being forwarded per second. Isaac
Erik Froese
2009-Jun-04 17:59 UTC
[Lustre-discuss] Configuring Lustre routring between two tcp networks
Thanks Andreas and Natalie, I''ve made the changes you suggested (setting tcp1 as the external network) and I''m able to lctl ping the 128.122.x.y address but I still cannot ping the private address for the MDS. Could the problem be that the lustre fs on the private network is actually called tcp and not tcp0? Are those synonymous? Erik On Thu, Jun 4, 2009 at 9:53 AM, Isaac Huang <He.Huang at sun.com> wrote:> On Wed, Jun 03, 2009 at 05:45:10PM -0400, Erik Froese wrote: > > ...... > > I don''t see it sending any traffic to the router with tcpdump running > > on the router. > > Alternatively, you may run ''routerstat 1'' on the router to see how > much data is being forwarded per second. > > Isaac >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090604/a15504e0/attachment.html
Isaac Huang
2009-Jun-05 16:48 UTC
[Lustre-discuss] Configuring Lustre routring between two tcp networks
On Thu, Jun 04, 2009 at 01:59:48PM -0400, Erik Froese wrote:> Thanks Andreas and Natalie, > > I''ve made the changes you suggested (setting tcp1 as the external > network) and I''m able to lctl ping the 128.122.x.y address but I still > cannot ping the private address for the MDS.Please show us the commands you''ve run and their outputs, together with error messages in dmesg. It''d help to "echo +neterror > /proc/sys/lnet/printk" before running the commands.> Could the problem be that the lustre fs on the private network is > actually called tcp and not tcp0? Are those synonymous?No, ''tcp'' is just a shorthand for ''tcp0'' - they are 100% equivalent to each other. Isaac
Erik Froese
2009-Jun-12 02:51 UTC
[Lustre-discuss] Configuring Lustre routring between two tcp networks
OK here''s where I am now. The public client can ping the routers public address but not the private address. [root at routed-client lnet]$ cat /etc/modprobe.conf alias eth0 e1000 alias eth1 e1000 alias scsi_hostadapter megaraid_mbox alias scsi_hostadapter1 ata_piix # eth0 is part of tcp1 (NYU-NET) # In order to get to tcp (Cluster private), use the network on # 128.122.X.Y at tcp1 options lnet accept=all options lnet networks=tcp1(eth0) routes="tcp 128.122.X.Y at tcp1" [root at routed-client lnet]$ lctl network up LNET configured [root at routed-client lnet]$ cat /proc/sys/lnet/routes Routing disabled net hops state router tcp 1 up 128.122.109.28 at tcp1 [root at routed-client lnet]$ cat /proc/sys/lnet/routers ref rtr_ref alive_cnt state last_ping router 3 1 0 up 0 128.122.109.28 at tcp1 [root at routed-client lnet]$ lctl ping 128.122.109.28 at tcp1 12345-0 at lo 12345-10.1.255.247 at tcp 12345-128.122.109.28 at tcp1 [root at routed-client lnet]$ lctl ping 10.1.255.252 at tcp failed to ping 10.1.255.252 at tcp: Input/output error I can see traffic between the routed-client and the router as well as between the router and the MGS/MDS (10.1.255.252 at tcp) The mgs has the following config. [root at mgs-0-0 lnet]# cat /etc/modprobe.conf alias scsi_hostadapter mptbase alias scsi_hostadapter1 mptsas alias scsi_hostadapter2 usb-storage alias eth0 e1000 alias eth1 e1000 alias eth2 e1000 alias eth3 e1000 options lnet forwarding="enabled" options lnet accept=all options lnet networks=tcp(eth0) routes="tcp1 10.1.255.247 at tcp" [root at mgs-0-0 lnet]# lctl network up LNET configured But it doesn''t see any routes or routers. [root at mgs-0-0 lnet]# cat /proc/sys/lnet/routes Routing disabled net hops state router [root at mgs-0-0 lnet]# cat /proc/sys/lnet/routers ref rtr_ref alive_cnt state last_ping router And this is what /var/log/messages and dmesg contain with or without enabling neterror logging Jun 11 22:41:07 mgs-0-0 kernel: LustreError: 10869:0:(lib-move.c:1250:lnet_send()) No route to 12345-128.122.X.Y at tcp1 Jun 11 22:41:07 mgs-0-0 kernel: LustreError: 10869:0:(lib-move.c:1723:lnet_parse_get()) 10.1.255.252 at tcp: Unable to send REPLY for GET from 12345-128.122.X.Y at tcp1: -113 On Fri, Jun 5, 2009 at 12:48 PM, Isaac Huang <He.Huang at sun.com> wrote:> On Thu, Jun 04, 2009 at 01:59:48PM -0400, Erik Froese wrote: > > Thanks Andreas and Natalie, > > > > I''ve made the changes you suggested (setting tcp1 as the external > > network) and I''m able to lctl ping the 128.122.x.y address but I still > > cannot ping the private address for the MDS. > > Please show us the commands you''ve run and their outputs, together > with error messages in dmesg. It''d help to "echo +neterror > > /proc/sys/lnet/printk" before running the commands. > > > Could the problem be that the lustre fs on the private network is > > actually called tcp and not tcp0? Are those synonymous? > > No, ''tcp'' is just a shorthand for ''tcp0'' - they are 100% equivalent > to each other. > > Isaac >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090611/f42b3844/attachment.html
Isaac Huang
2009-Jun-14 01:48 UTC
[Lustre-discuss] Configuring Lustre routring between two tcp networks
On Thu, Jun 11, 2009 at 10:51:01PM -0400, Erik Froese wrote:> OK here''s where I am now. > > The public client can ping the routers public address but not the > private address. > > [root at routed-client lnet]$ cat /etc/modprobe.conf > ...... > options lnet accept=allThis would allow connections from unprivileged ports, which is probably not what you want unless you have liblustre clients. The default "accept" setting should work fine.> ...... > [root at routed-client lnet]$ lctl ping 10.1.255.252 at tcp > failed to ping 10.1.255.252 at tcp: Input/output error > I can see traffic between the routed-client and the router as well as > between the router and the MGS/MDS (10.1.255.252 at tcp) > > The mgs has the following config. > > [root at mgs-0-0 lnet]# cat /etc/modprobe.conf > ...... > options lnet forwarding="enabled"Only needed for routers.> options lnet networks=tcp(eth0) routes="tcp1 10.1.255.247 at tcp"This looks good.> [root at mgs-0-0 lnet]# lctl network up > LNET configured > But it doesn''t see any routes or routers.Which was why the client couldn''t ping the MGS - ping request arrived at the MGS, but the MGS didn''t have a route to send back its reply to the client network.> [root at mgs-0-0 lnet]# cat /proc/sys/lnet/routes > Routing disabled > net hops state routerCould you try to unload and load the lnet module again? Or add quotes over in modprobe.conf around all options, or add the options directly at modprobe command line. Somehow the routes option seemed to be ignored. Thanks, Isaac