Hi, I have two Infiniband clusters, each in a separate location with a solid ethernet connectivity between each of them. Say they are named cluster A and cluster B. All members of each clusters have both IB and eth networks available to them, and the IB network is not routed between cluster A and B, but ethernet is. On each clusters, I have 4 OSS''s serving FC disks. Clients on cluster A mounts Lustre disk from their local cluster, and the same goes on for for cluster B, both on Infiniband NIDs. What I would like to achieve is client from cluster A to mount disks from OSS''s on cluster B on the ethernet connection. The same goes on for clients in cluster B to mount disks from OSS''s on cluster A. From my readings in the luster 1.8.7 manual, I got: 7.1.1 Modprobe.conf Options under modprobe.conf are used to specify the networks available to a node. You have the choice of two different options -- the networks option, which explicitly lists the networks available and the ip2nets option, which provides a list-matching lookup. Only one option can be used at any one time. The order of LNET lines in modprobe.conf is important when configuring multi-homed servers. *If a server node can be reached using more than one network, the first network specified in modprobe.conf will be used.* Is the last sentence means that I cannot do that? Thanks. -- Patrice Hamelin Specialiste s?nior en syst?mes d''exploitation | Senior OS specialist Environnement Canada | Environment Canada 2121, route Transcanadienne | 2121 Transcanada Highway Dorval, QC H9P 1J3 Gouvernement du Canada | Government of Canada -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20111216/8e7a307f/attachment.html
Hi, I have two Infiniband clusters, each in a separate location with a solid ethernet connectivity between each of them. Say they are named cluster A and cluster B. All members of each clusters have both IB and eth networks available to them, and the IB network is not routed between cluster A and B, but ethernet is. On each clusters, I have 4 OSS''s serving FC disks. Clients on cluster A mounts Lustre disk from their local cluster, and the same goes on for for cluster B, both on Infiniband NIDs. What I would like to achieve is client from cluster A to mount disks from OSS''s on cluster B on the ethernet connection. The same goes on for clients in cluster B to mount disks from OSS''s on cluster A. From my readings in the luster 1.8.7 manual, I got: 7.1.1 Modprobe.conf Options under modprobe.conf are used to specify the networks available to a node. You have the choice of two different options -- the networks option, which explicitly lists the networks available and the ip2nets option, which provides a list-matching lookup. Only one option can be used at any one time. The order of LNET lines in modprobe.conf is important when configuring multi-homed servers. *If a server node can be reached using more than one network, the first network specified in modprobe.conf will be used.* Is the last sentence means that I cannot do that? Thanks. -- Patrice Hamelin Specialiste s?nior en syst?mes d''exploitation | Senior OS specialist Environnement Canada | Environment Canada 2121, route Transcanadienne | 2121 Transcanada Highway Dorval, QC H9P 1J3 Gouvernement du Canada | Government of Canada -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20111216/1a89edc3/attachment.html
You can do this, simply define networks for both devices. Assuming ib0, and eth0, you would have options lnet networks="tcp0(eth0),o2ib0(ib0)" The IB clients will mount using a @o2ib0 NID, and the ethernet clients will mount using @tcp0 NIDs. Since you are explicitly specifying the network, the hop rule doesn''t apply. cliffw On Fri, Dec 16, 2011 at 9:49 AM, Patrice Hamelin <patrice.hamelin at ec.gc.ca>wrote:> ** > Hi, > > I have two Infiniband clusters, each in a separate location with a solid > ethernet connectivity between each of them. Say they are named cluster A > and cluster B. All members of each clusters have both IB and eth networks > available to them, and the IB network is not routed between cluster A and > B, but ethernet is. On each clusters, I have 4 OSS''s serving FC disks. > Clients on cluster A mounts Lustre disk from their local cluster, and the > same goes on for for cluster B, both on Infiniband NIDs. > > What I would like to achieve is client from cluster A to mount disks > from OSS''s on cluster B on the ethernet connection. The same goes on for > clients in cluster B to mount disks from OSS''s on cluster A. > > From my readings in the luster 1.8.7 manual, I got: > > 7.1.1 Modprobe.conf > Options under modprobe.conf are used to specify the networks available to > a node. > You have the choice of two different options ? the networks option, which > explicitly > lists the networks available and the ip2nets option, which provides a > list-matching > lookup. Only one option can be used at any one time. The order of LNET > lines in > modprobe.conf is important when configuring multi-homed servers. *If a > server > node can be reached using more than one network, the first network > specified in > modprobe.conf will be used.* > > Is the last sentence means that I cannot do that? > > Thanks. > > -- > Patrice Hamelin > Specialiste s?nior en syst?mes d''exploitation | Senior OS specialist > Environnement Canada | Environment Canada > 2121, route Transcanadienne | 2121 Transcanada Highway > Dorval, QC H9P 1J3 > Gouvernement du Canada | Government of Canada > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >-- cliffw Support Guy WhamCloud, Inc. www.whamcloud.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20111216/44ef759e/attachment.html
Cliff, Maybe our configuration is a bit special. We are running two Infiniband partitions, one for storage and the other for TCP over IB. Both clusters are named IB3 and IB4. I have 4 OSS on clustre IB3 which are configured like: bond0 Link encap:InfiniBand HWaddr 80:00:00:4B:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 inet addr:10.10.135.115 Bcast:10.10.135.255 Mask:255.255.255.0 inet6 addr: fe80::202:c903:e:8bc6/64 Scope:Link UP BROADCAST RUNNING MASTER MULTICAST MTU:65520 Metric:1 RX packets:6 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:336 (336.0 b) TX bytes:0 (0.0 b) eth0 Link encap:Ethernet HWaddr E4:1F:13:60:93:C0 inet addr:10.10.132.115 Bcast:10.10.132.255 Mask:255.255.255.0 inet6 addr: fe80::e61f:13ff:fe60:93c0/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:85 errors:0 dropped:0 overruns:0 frame:0 TX packets:91 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:10707 (10.4 KiB) TX bytes:10607 (10.3 KiB) Interrupt:169 Memory:92000000-92012800 ib0.8001 Link encap:InfiniBand HWaddr 80:00:00:4A:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 UP BROADCAST RUNNING SLAVE MULTICAST MTU:65520 Metric:1 RX packets:3 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:168 (168.0 b) TX bytes:0 (0.0 b) ib1.8001 Link encap:InfiniBand HWaddr 80:00:00:4B:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 UP BROADCAST RUNNING SLAVE MULTICAST MTU:65520 Metric:1 RX packets:3 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:256 RX bytes:168 (168.0 b) TX bytes:0 (0.0 b) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:8 errors:0 dropped:0 overruns:0 frame:0 TX packets:8 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:560 (560.0 b) TX bytes:560 (560.0 b) [root at ib3-st01 ~]# cat /etc/modprobe.conf alias eth0 bnx2 alias eth1 bnx2 alias scsi_hostadapter mptbase alias scsi_hostadapter1 mptsas alias scsi_hostadapter2 ata_piix alias scsi_hostadapter3 qla2xxx alias usb0 cdc_ether alias bond0 bonding options bond0 miimon=100 mode=1 options lnet networks="o2ib(bond0),tcp(eth0)" options ost oss_num_threads=24 I formatted the MGS/MDT like: mkfs.lustre --mgs --mdt --fsname=sata --reformat /dev/mpath/emcssd-1 And the 8 OST''s like: mkfs.lustre --fsname sata --reformat --ost --mgsnode=10.10.135.115 at o2ib --mgsnode=10.10.132.115 at tcp /dev/mpath/colosse4-lun53-sata [root at ib3-st01 ~]# cat /etc/ha.d/haresources ib3-st01 Filesystem::/dev/mpath/emcssd-1::/mnt/mdt-colosse::lustre ib3-st01 Filesystem::/dev/mpath/colosse4-lun53-sata::/mnt/data/clun53::lustre ib3-st02 Filesystem::/dev/mpath/colosse4-lun54-sata::/mnt/data/clun54::lustre ib3-st03 Filesystem::/dev/mpath/colosse4-lun55-sata::/mnt/data/clun55::lustre ib3-st04 Filesystem::/dev/mpath/colosse4-lun56-sata::/mnt/data/clun56::lustre ib3-st01 Filesystem::/dev/mpath/colosse4-lun57-sata::/mnt/data/clun57::lustre ib3-st02 Filesystem::/dev/mpath/colosse4-lun58-sata::/mnt/data/clun58::lustre ib3-st03 Filesystem::/dev/mpath/colosse4-lun59-sata::/mnt/data/clun59::lustre ib3-st04 Filesystem::/dev/mpath/colosse4-lun60-sata::/mnt/data/clun60::lustre [root at ib3-st01 ~]# lctl list_nids 10.10.135.115 at o2ib 10.10.132.115 at tcp service heartbeat start Client on cluster IB3 ib3-bc3e41-be01:~# ifconfig ib0.8001 Link encap:UNSPEC HWaddr 80-00-00-51-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:10.10.135.74 Bcast:10.10.135.255 Mask:255.255.255.0 inet6 addr: fe80::224:e890:97fe:fc91/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:5580 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:2048 RX bytes:430797 (430.7 KB) TX bytes:0 (0.0 B) ib0.8608 Link encap:UNSPEC HWaddr 80-00-00-4A-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:10.10.133.74 Bcast:10.10.133.255 Mask:255.255.255.0 inet6 addr: fe80::224:e890:97fe:fc91/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:209527 errors:0 dropped:0 overruns:0 frame:0 TX packets:99270 errors:0 dropped:2 overruns:0 carrier:0 collisions:0 txqueuelen:2048 RX bytes:20774987 (20.7 MB) TX bytes:16029957 (16.0 MB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:157814 errors:0 dropped:0 overruns:0 frame:0 TX packets:157814 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:7262472 (7.2 MB) TX bytes:7262472 (7.2 MB) ib3-bc3e41-be01:/proc/fs/lustre/osc# cat /etc/modprobe.d/lustre.conf options lnet networks="o2ib(ib0.8001),tcp(ib0.8608) I am able to mount both o2ib and tcp (strange though but still it works!) ib3-bc3e41-be01:/proc/fs/lustre/osc# mount -t lustre 10.10.135.115 at o2ib:/sata on /mnt/sata type lustre (rw) 10.10.132.115 at tcp:/sata on /mnt/sata type lustre (rw) The same goes for clients on cluster IB4. What I would like to achieve is TCP mount from cluster IB4 to cluster IB3 Clients on cluster IB4 are like: ib4-bc1f82-be01:~# ifconfig ib0.8003 Link encap:UNSPEC HWaddr 80-00-00-50-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:10.10.142.26 Bcast:10.10.142.255 Mask:255.255.255.0 inet6 addr: fe80::224:e890:97fe:fca9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:2530 errors:0 dropped:0 overruns:0 frame:0 TX packets:280 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:2048 RX bytes:609159 (609.1 KB) TX bytes:16936 (16.9 KB) ib0.8613 Link encap:UNSPEC HWaddr 80-00-00-4A-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:10.10.140.26 Bcast:10.10.140.255 Mask:255.255.255.0 inet6 addr: fe80::224:e890:97fe:fca9/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:4218 errors:0 dropped:0 overruns:0 frame:0 TX packets:3196 errors:0 dropped:1 overruns:0 carrier:0 collisions:0 txqueuelen:2048 RX bytes:570916 (570.9 KB) TX bytes:1665488 (1.6 MB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:1455 errors:0 dropped:0 overruns:0 frame:0 TX packets:1455 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:69554 (69.5 KB) TX bytes:69554 (69.5 KB) ib4-bc1f82-be01:~# cat /etc/modprobe.d/lustre.conf options lnet networks="o2ib(ib0.8003),tcp(ib0.8613)" ib4-bc1f82-be01:~# lctl ping 10.10.132.115 at tcp 12345-0 at lo 12345-10.10.135.115 at o2ib 12345-10.10.132.115 at tcp ib4-bc1f82-be01:~# mount -t lustre 10.10.132.115 at tcp:/sata /mnt/sata That hangs and the log files says: Dec 19 12:43:50 ib4-bc1f82-be01 kernel: [ 1649.617429] Lustre: 2420:0:(import.c:517:import_select_connection()) sata-MDT0000-mdc-ffff880c3a9e6400: tried all connections, increasing latency to 1s Dec 19 12:45:05 ib4-bc1f82-be01 kernel: [ 1724.492699] Lustre: 2420:0:(import.c:517:import_select_connection()) sata-MDT0000-mdc-ffff880c3a9e6400: tried all connections, increasing latency to 4s Dec 19 12:45:05 ib4-bc1f82-be01 kernel: [ 1724.492705] Lustre: 2420:0:(import.c:517:import_select_connection()) Skipped 2 previous similar messages Dec 19 12:47:35 ib4-bc1f82-be01 kernel: [ 1874.243747] Lustre: 2420:0:(import.c:517:import_select_connection()) sata-MDT0000-mdc-ffff880c3a9e6400: tried all connections, increasing latency to 10s Dec 19 12:47:35 ib4-bc1f82-be01 kernel: [ 1874.243754] Lustre: 2420:0:(import.c:517:import_select_connection()) Skipped 5 previous similar messages Dec 19 12:52:35 ib4-bc1f82-be01 kernel: [ 2173.742386] Lustre: 2420:0:(import.c:517:import_select_connection()) sata-MDT0000-mdc-ffff880c3a9e6400: tried all connections, increasing latency to 21s Dec 19 12:52:35 ib4-bc1f82-be01 kernel: [ 2173.742393] Lustre: 2420:0:(import.c:517:import_select_connection()) Skipped 10 previous similar messages Dec 19 12:52:35 ib4-bc1f82-be01 kernel: [ 2173.742544] Lustre: 2419:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1388626094064659 sent from sata-MDT0000-mdc-ffff880c3a9e6400 to NID 10.10.135.115 at o2ib 0s ago has failed due to network error (26s prior to deadline).* *Dec 19 12:52:35 ib4-bc1f82-be01 kernel: [ 2173.742547] req at ffff880c3b0e6400 x1388626094064659/t0 o38->sata-MDT0000_UUID at 10.10.135.115@o2ib:12/10 lens 368/584 e 0 to 1 dl 1324299181 ref 1 fl Rpc:N/0/0 rc 0/0 Dec 19 12:52:35 ib4-bc1f82-be01 kernel: [ 2173.742554] Lustre: 2419:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 23 previous similar messages Seems like I have a network error from "sata-MDT0000-mdc-ffff880c3a9e6400" to NID "10.10.135.115 at o2ib" Same phenomenon is observed if I try to mount IB3 clients from IB4 lustre partitions. What am I missing here? Thanks. On 12/16/11 22:27, Cliff White wrote:> You can do this, simply define networks for both devices. > Assuming ib0, and eth0, you would have > options lnet networks="tcp0(eth0),o2ib0(ib0)" > > The IB clients will mount using a @o2ib0 NID, and the ethernet clients > will mount using @tcp0 NIDs. Since you are explicitly specifying the > network, the hop rule doesn''t apply. > cliffw > > > On Fri, Dec 16, 2011 at 9:49 AM, Patrice Hamelin > <patrice.hamelin at ec.gc.ca <mailto:patrice.hamelin at ec.gc.ca>> wrote: > > Hi, > > I have two Infiniband clusters, each in a separate location with > a solid ethernet connectivity between each of them. Say they are > named cluster A and cluster B. All members of each clusters have > both IB and eth networks available to them, and the IB network is > not routed between cluster A and B, but ethernet is. On each > clusters, I have 4 OSS''s serving FC disks. Clients on cluster A > mounts Lustre disk from their local cluster, and the same goes on > for for cluster B, both on Infiniband NIDs. > > What I would like to achieve is client from cluster A to mount > disks from OSS''s on cluster B on the ethernet connection. The > same goes on for clients in cluster B to mount disks from OSS''s on > cluster A. > > From my readings in the luster 1.8.7 manual, I got: > > 7.1.1 Modprobe.conf > Options under modprobe.conf are used to specify the networks > available to a node. > You have the choice of two different options ? the networks > option, which explicitly > lists the networks available and the ip2nets option, which > provides a list-matching > lookup. Only one option can be used at any one time. The order of > LNET lines in > modprobe.conf is important when configuring multi-homed servers. > *If a server > node can be reached using more than one network, the first network > specified in > modprobe.conf will be used.* > > Is the last sentence means that I cannot do that? > > Thanks. > > -- > Patrice Hamelin > Specialiste s?nior en syst?mes d''exploitation | Senior OS specialist > Environnement Canada | Environment Canada > 2121, route Transcanadienne | 2121 Transcanada Highway > Dorval, QC H9P 1J3 > Gouvernement du Canada | Government of Canada > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > <mailto:Lustre-discuss at lists.lustre.org> > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > -- > cliffw > Support Guy > WhamCloud, Inc. > www.whamcloud.com <http://www.whamcloud.com> > >-- Patrice Hamelin Specialiste s?nior en syst?mes d''exploitation | Senior OS specialist Environnement Canada | Environment Canada 2121, route Transcanadienne | 2121 Transcanada Highway Dorval, QC H9P 1J3 T?l?phone | Telephone 514-421-5303 T?l?copieur | Facsimile 514-421-7231 Gouvernement du Canada | Government of Canada -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20111219/3ef3cb91/attachment-0001.html
OK! Found the solution (came from a Luster user). So simple!... Quote: --- I think the possible solution to your problem lies in differentiating the two different IB networks - by changing the lustre lnet device names. This means that each separate cluster would have different non-default "o2ib" naming convention in modprobe.conf. The IB3 lustre servers might call it: options lnet networks="o2ib3(bond0),tcp(eth0)" and the IB4 lustre servers might call it: options lnet networks="o2ib4(bond0),tcp(eth0)" --- That solution works perfectly. Thanks to repliers! Season''s Greetings all! On 12/19/11 12:57, Patrice Hamelin wrote:> Cliff, > > Maybe our configuration is a bit special. We are running two > Infiniband partitions, one for storage and the other for TCP over IB. > Both clusters are named IB3 and IB4. > > I have 4 OSS on clustre IB3 which are configured like: > > bond0 Link encap:InfiniBand HWaddr > 80:00:00:4B:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 > inet addr:10.10.135.115 Bcast:10.10.135.255 Mask:255.255.255.0 > inet6 addr: fe80::202:c903:e:8bc6/64 Scope:Link > UP BROADCAST RUNNING MASTER MULTICAST MTU:65520 Metric:1 > RX packets:6 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:336 (336.0 b) TX bytes:0 (0.0 b) > > eth0 Link encap:Ethernet HWaddr E4:1F:13:60:93:C0 > inet addr:10.10.132.115 Bcast:10.10.132.255 Mask:255.255.255.0 > inet6 addr: fe80::e61f:13ff:fe60:93c0/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:85 errors:0 dropped:0 overruns:0 frame:0 > TX packets:91 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:10707 (10.4 KiB) TX bytes:10607 (10.3 KiB) > Interrupt:169 Memory:92000000-92012800 > > ib0.8001 Link encap:InfiniBand HWaddr > 80:00:00:4A:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 > UP BROADCAST RUNNING SLAVE MULTICAST MTU:65520 Metric:1 > RX packets:3 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:256 > RX bytes:168 (168.0 b) TX bytes:0 (0.0 b) > > ib1.8001 Link encap:InfiniBand HWaddr > 80:00:00:4B:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 > UP BROADCAST RUNNING SLAVE MULTICAST MTU:65520 Metric:1 > RX packets:3 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:256 > RX bytes:168 (168.0 b) TX bytes:0 (0.0 b) > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:8 errors:0 dropped:0 overruns:0 frame:0 > TX packets:8 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:560 (560.0 b) TX bytes:560 (560.0 b) > > [root at ib3-st01 ~]# cat /etc/modprobe.conf > alias eth0 bnx2 > alias eth1 bnx2 > alias scsi_hostadapter mptbase > alias scsi_hostadapter1 mptsas > alias scsi_hostadapter2 ata_piix > alias scsi_hostadapter3 qla2xxx > alias usb0 cdc_ether > alias bond0 bonding > options bond0 miimon=100 mode=1 > options lnet networks="o2ib(bond0),tcp(eth0)" > options ost oss_num_threads=24 > > I formatted the MGS/MDT like: > > mkfs.lustre --mgs --mdt --fsname=sata --reformat /dev/mpath/emcssd-1 > > And the 8 OST''s like: > > mkfs.lustre --fsname sata --reformat --ost > --mgsnode=10.10.135.115 at o2ib --mgsnode=10.10.132.115 at tcp > /dev/mpath/colosse4-lun53-sata > > > [root at ib3-st01 ~]# cat /etc/ha.d/haresources > ib3-st01 Filesystem::/dev/mpath/emcssd-1::/mnt/mdt-colosse::lustre > ib3-st01 > Filesystem::/dev/mpath/colosse4-lun53-sata::/mnt/data/clun53::lustre > ib3-st02 > Filesystem::/dev/mpath/colosse4-lun54-sata::/mnt/data/clun54::lustre > ib3-st03 > Filesystem::/dev/mpath/colosse4-lun55-sata::/mnt/data/clun55::lustre > ib3-st04 > Filesystem::/dev/mpath/colosse4-lun56-sata::/mnt/data/clun56::lustre > ib3-st01 > Filesystem::/dev/mpath/colosse4-lun57-sata::/mnt/data/clun57::lustre > ib3-st02 > Filesystem::/dev/mpath/colosse4-lun58-sata::/mnt/data/clun58::lustre > ib3-st03 > Filesystem::/dev/mpath/colosse4-lun59-sata::/mnt/data/clun59::lustre > ib3-st04 > Filesystem::/dev/mpath/colosse4-lun60-sata::/mnt/data/clun60::lustre > > [root at ib3-st01 ~]# lctl list_nids > 10.10.135.115 at o2ib > 10.10.132.115 at tcp > > service heartbeat start > > > Client on cluster IB3 > ib3-bc3e41-be01:~# ifconfig > ib0.8001 Link encap:UNSPEC HWaddr > 80-00-00-51-FE-80-00-00-00-00-00-00-00-00-00-00 > inet addr:10.10.135.74 Bcast:10.10.135.255 Mask:255.255.255.0 > inet6 addr: fe80::224:e890:97fe:fc91/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 > RX packets:5580 errors:0 dropped:0 overruns:0 frame:0 > TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:2048 > RX bytes:430797 (430.7 KB) TX bytes:0 (0.0 B) > > ib0.8608 Link encap:UNSPEC HWaddr > 80-00-00-4A-FE-80-00-00-00-00-00-00-00-00-00-00 > inet addr:10.10.133.74 Bcast:10.10.133.255 Mask:255.255.255.0 > inet6 addr: fe80::224:e890:97fe:fc91/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 > RX packets:209527 errors:0 dropped:0 overruns:0 frame:0 > TX packets:99270 errors:0 dropped:2 overruns:0 carrier:0 > collisions:0 txqueuelen:2048 > RX bytes:20774987 (20.7 MB) TX bytes:16029957 (16.0 MB) > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:157814 errors:0 dropped:0 overruns:0 frame:0 > TX packets:157814 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:7262472 (7.2 MB) TX bytes:7262472 (7.2 MB) > > ib3-bc3e41-be01:/proc/fs/lustre/osc# cat /etc/modprobe.d/lustre.conf > options lnet networks="o2ib(ib0.8001),tcp(ib0.8608) > > I am able to mount both o2ib and tcp (strange though but still it works!) > > ib3-bc3e41-be01:/proc/fs/lustre/osc# mount -t lustre > 10.10.135.115 at o2ib:/sata on /mnt/sata type lustre (rw) > 10.10.132.115 at tcp:/sata on /mnt/sata type lustre (rw) > > The same goes for clients on cluster IB4. > > What I would like to achieve is TCP mount from cluster IB4 to cluster IB3 > > Clients on cluster IB4 are like: > ib4-bc1f82-be01:~# ifconfig > ib0.8003 Link encap:UNSPEC HWaddr > 80-00-00-50-FE-80-00-00-00-00-00-00-00-00-00-00 > inet addr:10.10.142.26 Bcast:10.10.142.255 Mask:255.255.255.0 > inet6 addr: fe80::224:e890:97fe:fca9/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 > RX packets:2530 errors:0 dropped:0 overruns:0 frame:0 > TX packets:280 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:2048 > RX bytes:609159 (609.1 KB) TX bytes:16936 (16.9 KB) > > ib0.8613 Link encap:UNSPEC HWaddr > 80-00-00-4A-FE-80-00-00-00-00-00-00-00-00-00-00 > inet addr:10.10.140.26 Bcast:10.10.140.255 Mask:255.255.255.0 > inet6 addr: fe80::224:e890:97fe:fca9/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 > RX packets:4218 errors:0 dropped:0 overruns:0 frame:0 > TX packets:3196 errors:0 dropped:1 overruns:0 carrier:0 > collisions:0 txqueuelen:2048 > RX bytes:570916 (570.9 KB) TX bytes:1665488 (1.6 MB) > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:1455 errors:0 dropped:0 overruns:0 frame:0 > TX packets:1455 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:69554 (69.5 KB) TX bytes:69554 (69.5 KB) > > ib4-bc1f82-be01:~# cat /etc/modprobe.d/lustre.conf > options lnet networks="o2ib(ib0.8003),tcp(ib0.8613)" > > ib4-bc1f82-be01:~# lctl ping 10.10.132.115 at tcp > 12345-0 at lo > 12345-10.10.135.115 at o2ib > 12345-10.10.132.115 at tcp > > ib4-bc1f82-be01:~# mount -t lustre 10.10.132.115 at tcp:/sata /mnt/sata > > That hangs and the log files says: > > Dec 19 12:43:50 ib4-bc1f82-be01 kernel: [ 1649.617429] Lustre: > 2420:0:(import.c:517:import_select_connection()) > sata-MDT0000-mdc-ffff880c3a9e6400: tried all connections, increasing > latency to 1s > Dec 19 12:45:05 ib4-bc1f82-be01 kernel: [ 1724.492699] Lustre: > 2420:0:(import.c:517:import_select_connection()) > sata-MDT0000-mdc-ffff880c3a9e6400: tried all connections, increasing > latency to 4s > Dec 19 12:45:05 ib4-bc1f82-be01 kernel: [ 1724.492705] Lustre: > 2420:0:(import.c:517:import_select_connection()) Skipped 2 previous > similar messages > Dec 19 12:47:35 ib4-bc1f82-be01 kernel: [ 1874.243747] Lustre: > 2420:0:(import.c:517:import_select_connection()) > sata-MDT0000-mdc-ffff880c3a9e6400: tried all connections, increasing > latency to 10s > Dec 19 12:47:35 ib4-bc1f82-be01 kernel: [ 1874.243754] Lustre: > 2420:0:(import.c:517:import_select_connection()) Skipped 5 previous > similar messages > Dec 19 12:52:35 ib4-bc1f82-be01 kernel: [ 2173.742386] Lustre: > 2420:0:(import.c:517:import_select_connection()) > sata-MDT0000-mdc-ffff880c3a9e6400: tried all connections, increasing > latency to 21s > Dec 19 12:52:35 ib4-bc1f82-be01 kernel: [ 2173.742393] Lustre: > 2420:0:(import.c:517:import_select_connection()) Skipped 10 previous > similar messages > Dec 19 12:52:35 ib4-bc1f82-be01 kernel: [ 2173.742544] Lustre: > 2419:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request > x1388626094064659 sent from sata-MDT0000-mdc-ffff880c3a9e6400 to NID > 10.10.135.115 at o2ib 0s ago has failed due to network error (26s prior > to deadline).* > *Dec 19 12:52:35 ib4-bc1f82-be01 kernel: [ 2173.742547] > req at ffff880c3b0e6400 x1388626094064659/t0 > o38->sata-MDT0000_UUID at 10.10.135.115@o2ib:12/10 lens 368/584 e 0 to 1 > dl 1324299181 ref 1 fl Rpc:N/0/0 rc 0/0 > Dec 19 12:52:35 ib4-bc1f82-be01 kernel: [ 2173.742554] Lustre: > 2419:0:(client.c:1487:ptlrpc_expire_one_request()) Skipped 23 previous > similar messages > > > Seems like I have a network error from > "sata-MDT0000-mdc-ffff880c3a9e6400" to NID "10.10.135.115 at o2ib" > > Same phenomenon is observed if I try to mount IB3 clients from IB4 > lustre partitions. > > What am I missing here? > > Thanks. > > > On 12/16/11 22:27, Cliff White wrote: >> You can do this, simply define networks for both devices. >> Assuming ib0, and eth0, you would have >> options lnet networks="tcp0(eth0),o2ib0(ib0)" >> >> The IB clients will mount using a @o2ib0 NID, and the ethernet >> clients will mount using @tcp0 NIDs. Since you are explicitly >> specifying the network, the hop rule doesn''t apply. >> cliffw >> >> >> On Fri, Dec 16, 2011 at 9:49 AM, Patrice Hamelin >> <patrice.hamelin at ec.gc.ca <mailto:patrice.hamelin at ec.gc.ca>> wrote: >> >> Hi, >> >> I have two Infiniband clusters, each in a separate location >> with a solid ethernet connectivity between each of them. Say >> they are named cluster A and cluster B. All members of each >> clusters have both IB and eth networks available to them, and the >> IB network is not routed between cluster A and B, but ethernet >> is. On each clusters, I have 4 OSS''s serving FC disks. Clients >> on cluster A mounts Lustre disk from their local cluster, and the >> same goes on for for cluster B, both on Infiniband NIDs. >> >> What I would like to achieve is client from cluster A to mount >> disks from OSS''s on cluster B on the ethernet connection. The >> same goes on for clients in cluster B to mount disks from OSS''s >> on cluster A. >> >> From my readings in the luster 1.8.7 manual, I got: >> >> 7.1.1 Modprobe.conf >> Options under modprobe.conf are used to specify the networks >> available to a node. >> You have the choice of two different options ? the networks >> option, which explicitly >> lists the networks available and the ip2nets option, which >> provides a list-matching >> lookup. Only one option can be used at any one time. The order of >> LNET lines in >> modprobe.conf is important when configuring multi-homed servers. >> *If a server >> node can be reached using more than one network, the first >> network specified in >> modprobe.conf will be used.* >> >> Is the last sentence means that I cannot do that? >> >> Thanks. >> >> -- >> Patrice Hamelin >> Specialiste s?nior en syst?mes d''exploitation | Senior OS specialist >> Environnement Canada | Environment Canada >> 2121, route Transcanadienne | 2121 Transcanada Highway >> Dorval, QC H9P 1J3 >> Gouvernement du Canada | Government of Canada >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> <mailto:Lustre-discuss at lists.lustre.org> >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >> >> >> -- >> cliffw >> Support Guy >> WhamCloud, Inc. >> www.whamcloud.com <http://www.whamcloud.com> >> >> > > -- > Patrice Hamelin > Specialiste s?nior en syst?mes d''exploitation | Senior OS specialist > Environnement Canada | Environment Canada > 2121, route Transcanadienne | 2121 Transcanada Highway > Dorval, QC H9P 1J3 > T?l?phone | Telephone 514-421-5303 > T?l?copieur | Facsimile 514-421-7231 > Gouvernement du Canada | Government of Canada > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss-- Patrice Hamelin Specialiste s?nior en syst?mes d''exploitation | Senior OS specialist Environnement Canada | Environment Canada 2121, route Transcanadienne | 2121 Transcanada Highway Dorval, QC H9P 1J3 T?l?phone | Telephone 514-421-5303 T?l?copieur | Facsimile 514-421-7231 Gouvernement du Canada | Government of Canada -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20111219/179e715f/attachment-0001.html