Alexander Bugl
2010-Feb-17 09:01 UTC
[Lustre-discuss] lctl ping error "Unexpected version" between Lustre 1.8.1.1 and 1.8.2
Hi, I have a Lustre 1.8.1.1 System (MDS, OSS, all CentOS 5.3) with Lustre 1.6.4.3 (clients, Debian etch) running without problems. I now have 4 additional OSS nodes, which I set up using the new Lustre 1.8.2. But I can''t lctl ping between 1.8.1.1 nodes and 1.8.2 nodes using InfiniBand. To be more precise: OSS node 1: [root at oss01 ~]# ifconfig | grep -C1 ib0 ib0 Link encap:InfiniBand HWaddr ... inet addr:172.16.30.134 Bcast:172.16.30.255 Mask:255.255.255.0 [root at oss01 ~]# uname -a Linux oss01 2.6.18-164.11.1.el5_lustre.1.8.2 #1 SMP Fri Jan 22 19:11:17 MST 2010 x86_64 x86_64 x86_64 GNU/Linux OSS node 5: [root at oss05 ~]# ifconfig | grep -C1 ib0 ib0 Link encap:InfiniBand HWaddr ... inet addr:172.16.30.138 Bcast:172.16.30.255 Mask:255.255.255.0 [root at oss05 ~]# uname -a Linux oss05 2.6.18-128.7.1.el5_lustre.1.8.1.1 #1 SMP Tue Oct 6 05:48:57 MDT 2009 x86_64 x86_64 x86_64 GNU/Linux InfiniBand network is up and running, I can ping oss1 from oss5 and vice versa: [root at oss01 ~]# ping 172.16.30.138 PING 172.16.30.138 (172.16.30.138) 56(84) bytes of data. 64 bytes from 172.16.30.138: icmp_seq=1 ttl=64 time=0.125 ms 64 bytes from 172.16.30.138: icmp_seq=2 ttl=64 time=0.083 ms [root at oss05 ~]# ping 172.16.30.134 PING 172.16.30.134 (172.16.30.134) 56(84) bytes of data. 64 bytes from 172.16.30.134: icmp_seq=1 ttl=64 time=2.19 ms 64 bytes from 172.16.30.134: icmp_seq=2 ttl=64 time=0.076 ms And I am able to lctl ping the machines on their own addresses: [root at oss01 ~]# lctl ping 172.16.30.134 at o2ib 12345-0 at lo 12345-172.16.30.134 at o2ib [root at oss05 ~]# lctl ping 172.16.30.138 at o2ib 12345-0 at lo 12345-172.16.30.138 at o2ib But I can''t lctl ping the other machine: [root at oss01 ~]# lctl ping 172.16.30.138 at o2ib failed to ping 172.16.30.138 at o2ib: Protocol error [root at oss05 ~]# lctl ping 172.16.30.134 at o2ib failed to ping 172.16.30.134 at o2ib: Protocol error dmesg/meassage output is a little bit longer, but no other errors are logged except this line: [root at oss01 ~]# dmesg |tail -1 LustreError: 8855:0:(api-ni.c:1781:lnet_ping()) 12345-172.16.30.138 at o2ib: Unexpected version 0x1 [root at oss05 ~]# dmesg |tail -1 LustreError: 19249:0:(api-ni.c:1735:lnet_ping()) 12345-172.16.30.134 at o2ib: Unexpected version 0x2 I did not find anything regarding "Unexpected version 0x?" uding Google ... So I can''t mix 1.8.1.1 nodes and 1.8.2 nodes. That would be no major problem, because I could upgrade the "older" MDS and OSS nodes to 1.8.2, too, but I currently can''t upgrade the 1.6.4.3 Lustre clients. And the client nodes can''t be lctl ping''ed from Lustre 1.8.2, too (172.16.30.70 being one client IP): [root at oss01 ~]# lctl ping 172.16.30.70 at o2ib failed to ping 172.16.30.70 at o2ib: Protocol error I have nearly no InfiniBand know how (I inherited this system), so sorry if my question is a stupid one: What is going on here, and have I a simple possibility to solve that problem of no LNET connectivity between Lustre 1.8.2 and the older 1.8.1.1/1.6.4.3 servers? With regards, Alex -- Alexander Bugl, Central IT Services, ZMAW Max Planck Institute for Meteorology Bundesstrasse 53, D-20146 Hamburg, Germany tel +49-40-41173-351, fax -356, room PE048
Alexander Bugl
2010-Feb-17 14:58 UTC
[Lustre-discuss] lctl ping error "Unexpected version" between Lustre 1.8.1.1 and 1.8.2
Hi, replying to myself: Alexander Bugl wrote:> But I can''t lctl ping the other machine: > [root at oss01 ~]# lctl ping 172.16.30.138 at o2ib > failed to ping 172.16.30.138 at o2ib: Protocol error > [root at oss05 ~]# lctl ping 172.16.30.134 at o2ib > failed to ping 172.16.30.134 at o2ib: Protocol errorIt''s the same issue as described in bug 21456: https://bugzilla.lustre.org/show_bug.cgi?id=21456 (And this has been sent to lustre-discuss on 11th of Feb., which I have not seen and found ... :( ) Thanks to Dardo D Kleiner for opening my eyes, and sorry for the noise. With regards, Alex -- Alexander Bugl, Central IT Services, ZMAW Max Planck Institute for Meteorology Bundesstrasse 53, D-20146 Hamburg, Germany tel +49-40-41173-351, fax -356, room PE048