Christian Gajan
2008-May-05 10:38 UTC
[Lustre-discuss] mount.lustre: mount /dev/sdd1 at /hpcdata/ost1 failed: Protocol error
Hi, my lustre configuration is RHEL5.1 (2.6.18-53.1.13 + lustre patch) + OFED 1.2.5.5 + lustre 1.6.4.3 my lnet seem to work correctly but when I try to configure my OSS I got a Protocol error at the mount step Has someone already used lustre on infiniband (ofed stack) under redhat 5.1 (2.6.18-53.1.13) ? If Yes : with which version of OFED and which version of Lustre ? Thanks in advance for your help Here some detail MDT/MDS server # modprobe lnet # lctl network up # lctl list_nids 192.168.1.17 at o2ib # mkfs.lustre -fsname hpcdata --mdt --mgs /dev/sdb2 # mkdir -p /hpcdata/mdt # mount -t lustre /dev/sdb1 /hpcdata/mdt ALL is ok here OST server # modprobe lnet # lctl network up # lctl list_nids 192.168.1.16 at o2ib # lctl ping 192.168.1.17 at o2ib 12345 - 0 at lo 12345 - 192.168.1.17 at o2ib # mkfs.lustre --fsname hpcdata --ost --mgsnode=192.168.1.17 at o2ib /dev/sdd1 # mkdir -p /hpcdata/ost1 # mount -t lustre /dev/sdd1 /hpcdata/ost1 mount.lustre: mount /dev/sdd1 at /hpcdata/ost1 failed: Protocol error !!!!!!!!! I got in the /var/log/messages (OST side) Apr 24 07:17:55 s1206 kernel: kjournald starting. Commit interval 5 seconds Apr 24 07:17:55 s1206 kernel: LDISKFS FS on sdd1, internal journal Apr 24 07:17:55 s1206 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Apr 24 07:17:55 s1206 kernel: kjournald starting. Commit interval 5 seconds Apr 24 07:17:55 s1206 kernel: LDISKFS FS on sdd1, internal journal Apr 24 07:17:55 s1206 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Apr 24 07:17:55 s1206 kernel: LDISKFS-fs: file extents enabled Apr 24 07:17:55 s1206 kernel: LDISKFS-fs: mballoc enabled Apr 24 07:17:55 s1206 kernel: LustreError: 13483:0:(pack_generic.c:782:lustre_unpack_msg()) bad lustre msg magic: 0X26F54000 Apr 24 07:17:55 s1206 kernel: LustreError: 13483:0:(client.c:613:after_reply()) @@@ unpack_rep failed: -22 req at ffff81022eb06800 x11/t0 o253->MGS at MGC192.168.1.17@o2ib_0:26 lens 4672/4672 ref 1 fl Rpc:R/0/0 rc 0/-22 Apr 24 07:17:56 s1206 kernel: LustreError: 13483:0:(obd_mount.c:954:server_register_target()) registration with the MGS failed (-71) Apr 24 07:17:56 s1206 kernel: LustreError: 13483:0:(obd_mount.c:1054:server_start_targets()) Required registration failed for hpcdata-OSTffff: -71 Apr 24 07:17:56 s1206 kernel: LustreError: 13483:0:(obd_mount.c:1570:server_fill_super()) Unable to start targets: -71 Apr 24 07:17:56 s1206 kernel: LustreError: 13483:0:(obd_mount.c:1368:server_put_super()) no obd hpcdata-OSTffff Apr 24 07:17:56 s1206 kernel: LustreError: 13483:0:(obd_mount.c:119:server_deregister_mount()) hpcdata-OSTffff not registered Apr 24 07:17:56 s1206 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) Apr 24 07:17:56 s1206 kernel: LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost Apr 24 07:17:56 s1206 kernel: LDISKFS-fs: mballoc: 0 generated and it took 0 Apr 24 07:17:56 s1206 kernel: LDISKFS-fs: mballoc: 0 preallocated, 0 discarded Apr 24 07:17:56 s1206 kernel: Lustre: server umount hpcdata-OSTffff complete Apr 24 07:17:56 s1206 kernel: LustreError: 13483:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount (-71) I got in the /var/log/messages (MDS side) Apr 24 08:04:53 s1207 kernel: ko2iblnd: no version for "ib_fmr_pool_unmap" found: kernel tainted. Apr 24 08:04:53 s1207 kernel: Lustre: Added LNI 192.168.1.17 at o2ib [8/64] Apr 24 08:05:30 s1207 kernel: kjournald starting. Commit interval 5 seconds Apr 24 08:05:30 s1207 kernel: LDISKFS FS on sdb2, internal journal Apr 24 08:05:30 s1207 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Apr 24 08:05:30 s1207 kernel: kjournald starting. Commit interval 5 seconds Apr 24 08:05:30 s1207 kernel: LDISKFS FS on sdb2, internal journal Apr 24 08:05:30 s1207 kernel: LDISKFS-fs: mounted filesystem with ordered data mode. Apr 24 08:05:30 s1207 kernel: Lustre: MGS MGS started Apr 24 08:05:30 s1207 kernel: Lustre: Enabling user_xattr Apr 24 08:05:30 s1207 kernel: Lustre: MDT hpcdata-MDT0000 now serving dev (hpcdata-MDT0000/f8e01d68-5859-94c6-4f92-cde70c2d1781) with recovery enabled Apr 24 08:05:30 s1207 kernel: Lustre: 4974:0:(lproc_mds.c:260:lprocfs_wr_group_upcall()) hpcdata-MDT0000: group upcall set to /usr/sbin/l_getgroups Apr 24 08:05:30 s1207 kernel: Lustre: hpcdata-MDT0000.mdt: set parameter group_upcall=/usr/sbin/l_getgroups Apr 24 08:05:31 s1207 kernel: Lustre: Server hpcdata-MDT0000 on device /dev/sdb2 has started Apr 24 08:05:36 s1207 kernel: LustreError: 4988:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1209049530, 5s ago) req at ffff8102321de000 x7/t0 o8->hpcdata-OST0000_UUID at 192.168.1.16@o2ib:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 Apr 24 08:05:36 s1207 kernel: LustreError: 4988:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1209049530, 6s ago) req at ffff81022861ae00 x8/t0 o8->hpcdata-OST0001_UUID at 192.168.1.16@o2ib:6 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 I''ve made a mistake ? is it a bug (a quick search in bugzilla.lustre.org give nothing) ? Thanks in advance for your help regards christian -------------- next part -------------- A non-text attachment was scrubbed... Name: christian.gajan.vcf Type: text/x-vcard Size: 233 bytes Desc: not available Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080505/22478676/attachment.vcf
Andreas Dilger
2008-May-06 20:45 UTC
[Lustre-discuss] mount.lustre: mount /dev/sdd1 at /hpcdata/ost1 failed: Protocol error
On May 05, 2008 12:38 +0200, Christian Gajan wrote:> my lustre configuration is RHEL5.1 (2.6.18-53.1.13 + lustre patch) + OFED > 1.2.5.5 + lustre 1.6.4.3 > my lnet seem to work correctly but when I try to configure my OSS I got a > Protocol error at the mount step > > Has someone already used lustre on infiniband (ofed stack) under redhat 5.1 > (2.6.18-53.1.13) ? > If Yes : with which version of OFED and which version of Lustre ? > > Thanks in advance for your help > > Here some detail > > OST server > # modprobe lnet > # lctl network up > # lctl list_nids > 192.168.1.16 at o2ib > # lctl ping 192.168.1.17 at o2ib > 12345 - 0 at lo > 12345 - 192.168.1.17 at o2ib > # mkfs.lustre --fsname hpcdata --ost --mgsnode=192.168.1.17 at o2ib /dev/sdd1 > # mkdir -p /hpcdata/ost1 > # mount -t lustre /dev/sdd1 /hpcdata/ost1 > mount.lustre: mount /dev/sdd1 at /hpcdata/ost1 failed: Protocol error > !!!!!!!!! > > I got in the /var/log/messages (OST side) > > Apr 24 07:17:55 s1206 kernel: LustreError: > 13483:0:(pack_generic.c:782:lustre_unpack_msg()) bad lustre msg magic: > 0X26F54000This magic value is completely incorrect - it should be something like 0x0bd00bd3. Are you using some strange CPU architecture on the MDS or OSS node by some chance? Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.