According to tutorial I started MGS and MDT on first machine (with IP 192.168.2.55): mkfs.lustre --mgs /dev/sdb1 mkdir -p /mnt/mgs mount -t lustre /dev/sdb1 /mnt/mgs mkfs.lustre --fsname=testfs --mdt --mgsnode=192.168.2.55 at tcp0 /dev/sdb2 mkdir -p /mnt/test/mdt mount -t lustre /dev/sdb2 /mnt/test/mdt On second machine I tried start OST with: mkfs.lustre --fsname=testfs --ost --mgsnode=192.168.2.55 at tcp0 /dev/sdb mkdir -p /mnt/test/ost0 mount -t lustre /dev/sdb /mnt/test/ost0 but got this error: mount.lustre: mount /dev/sdb at /mnt/test/ost0 failed: Input/output error Is the MGS running? Machines can comunicate and MGS is probably running. It is Lustre 1.6.4.2 and kernel 2.6.18 Have you any ideas where could be a problem?
On Wed, Feb 20, 2008 at 11:19:06PM +0100, Tomec Martin wrote:> According to tutorial I started MGS and MDT on first machine (with IP > 192.168.2.55): > mkfs.lustre --mgs /dev/sdb1 > mkdir -p /mnt/mgs > mount -t lustre /dev/sdb1 /mnt/mgs > mkfs.lustre --fsname=testfs --mdt --mgsnode=192.168.2.55 at tcp0 /dev/sdb2 > mkdir -p /mnt/test/mdt > mount -t lustre /dev/sdb2 /mnt/test/mdt > > On second machine I tried start OST with: > mkfs.lustre --fsname=testfs --ost --mgsnode=192.168.2.55 at tcp0 /dev/sdb > mkdir -p /mnt/test/ost0 > mount -t lustre /dev/sdb /mnt/test/ost0 > > but got this error: > mount.lustre: mount /dev/sdb at /mnt/test/ost0 failed: Input/output error > Is the MGS running?Was there any error messages in ''dmesg'' on the node? Isaac> > Machines can comunicate and MGS is probably running. > It is Lustre 1.6.4.2 and kernel 2.6.18 > Have you any ideas where could be a problem? > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss
Isaac Huang napsal(a):> On Wed, Feb 20, 2008 at 11:19:06PM +0100, Tomec Martin wrote: >> According to tutorial I started MGS and MDT on first machine (with IP >> 192.168.2.55): >> mkfs.lustre --mgs /dev/sdb1 >> mkdir -p /mnt/mgs >> mount -t lustre /dev/sdb1 /mnt/mgs >> mkfs.lustre --fsname=testfs --mdt --mgsnode=192.168.2.55 at tcp0 /dev/sdb2 >> mkdir -p /mnt/test/mdt >> mount -t lustre /dev/sdb2 /mnt/test/mdt >> >> On second machine I tried start OST with: >> mkfs.lustre --fsname=testfs --ost --mgsnode=192.168.2.55 at tcp0 /dev/sdb >> mkdir -p /mnt/test/ost0 >> mount -t lustre /dev/sdb /mnt/test/ost0 >> >> but got this error: >> mount.lustre: mount /dev/sdb at /mnt/test/ost0 failed: Input/output error >> Is the MGS running? > > Was there any error messages in ''dmesg'' on the node? > > Isaac >Yes, log is below. Maybe it can be some incompatibility with Centos 5 (I used packages for Red Hat 5) Lustre: OBD class driver, info at clusterfs.com Lustre Version: 1.6.4.2 Build Version: 1.6.4.2-19691231190000-PRISTINE-.cache.build.BUILD.lustre-kernel-2.6.18.lustre.linux-2.6.18-8.1.14.el5_lustre.1.6.4.2smp Lustre: Added LNI 192.168.2.56 at tcp [8/256] Lustre: Accept secure, port 988 Lustre: Lustre Client File System; info at clusterfs.com kjournald starting. Commit interval 5 seconds LDISKFS FS on sdb, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. SELinux: initialized (dev sdb, type ldiskfs), not configured for labeling kjournald starting. Commit interval 5 seconds LDISKFS FS on sdb, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled SELinux: initialized (dev sdb, type ldiskfs), not configured for labeling LustreError: 2891:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1203589568, 5s ago) req at cbaa7600 x1/t0 o250->MGS at MGC192.168.2.55@tcp_0:26 lens 240/272 ref 1 fl Rpc:/0/0 rc 0/-22 LustreError: 2858:0:(obd_mount.c:954:server_register_target()) registration with the MGS failed (-5) LustreError: 2858:0:(obd_mount.c:1054:server_start_targets()) Required registration failed for testfs-OSTffff: -5 LustreError: 15f-b: Communication error with the MGS. Is the MGS running? LustreError: 2858:0:(obd_mount.c:1570:server_fill_super()) Unable to start targets: -5 LustreError: 2858:0:(obd_mount.c:1368:server_put_super()) no obd testfs-OSTffff LustreError: 2858:0:(obd_mount.c:119:server_deregister_mount()) testfs-OSTffff not registered LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded Lustre: server umount testfs-OSTffff complete LustreError: 2858:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount (-5)>> Machines can comunicate and MGS is probably running. >> It is Lustre 1.6.4.2 and kernel 2.6.18 >> Have you any ideas where could be a problem? >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >
When you say they can communicate, did you try lctl ping? On Feb 21, 2008, at 4:57 AM, Tomec Martin wrote:> > > Isaac Huang napsal(a): >> On Wed, Feb 20, 2008 at 11:19:06PM +0100, Tomec Martin wrote: >>> According to tutorial I started MGS and MDT on first machine (with >>> IP >>> 192.168.2.55): >>> mkfs.lustre --mgs /dev/sdb1 >>> mkdir -p /mnt/mgs >>> mount -t lustre /dev/sdb1 /mnt/mgs >>> mkfs.lustre --fsname=testfs --mdt --mgsnode=192.168.2.55 at tcp0 /dev/ >>> sdb2 >>> mkdir -p /mnt/test/mdt >>> mount -t lustre /dev/sdb2 /mnt/test/mdt >>> >>> On second machine I tried start OST with: >>> mkfs.lustre --fsname=testfs --ost --mgsnode=192.168.2.55 at tcp0 /dev/ >>> sdb >>> mkdir -p /mnt/test/ost0 >>> mount -t lustre /dev/sdb /mnt/test/ost0 >>> >>> but got this error: >>> mount.lustre: mount /dev/sdb at /mnt/test/ost0 failed: Input/ >>> output error >>> Is the MGS running? >> >> Was there any error messages in ''dmesg'' on the node? >> >> Isaac >> > > Yes, log is below. Maybe it can be some incompatibility with Centos > 5 (I > used packages for Red Hat 5) > > Lustre: OBD class driver, info at clusterfs.com > Lustre Version: 1.6.4.2 > Build Version: > 1.6.4.2-19691231190000-PRISTINE-.cache.build.BUILD.lustre- > kernel-2.6.18.lustre.linux-2.6.18-8.1.14.el5_lustre.1.6.4.2smp > Lustre: Added LNI 192.168.2.56 at tcp [8/256] > Lustre: Accept secure, port 988 > Lustre: Lustre Client File System; info at clusterfs.com > kjournald starting. Commit interval 5 seconds > LDISKFS FS on sdb, internal journal > LDISKFS-fs: mounted filesystem with ordered data mode. > SELinux: initialized (dev sdb, type ldiskfs), not configured for > labeling > kjournald starting. Commit interval 5 seconds > LDISKFS FS on sdb, internal journal > LDISKFS-fs: mounted filesystem with ordered data mode. > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > SELinux: initialized (dev sdb, type ldiskfs), not configured for > labeling > > LustreError: 2891:0:(client.c:975:ptlrpc_expire_one_request()) @@@ > timeout (sent at 1203589568, 5s ago) req at cbaa7600 x1/t0 > o250->MGS at MGC192.168.2.55@tcp_0:26 lens 240/272 ref 1 fl Rpc:/0/0 rc > 0/-22 > LustreError: 2858:0:(obd_mount.c:954:server_register_target()) > registration with the MGS failed (-5) > LustreError: 2858:0:(obd_mount.c:1054:server_start_targets()) Required > registration failed for testfs-OSTffff: -5 > LustreError: 15f-b: Communication error with the MGS. Is the MGS > running? > LustreError: 2858:0:(obd_mount.c:1570:server_fill_super()) Unable to > start targets: -5 > LustreError: 2858:0:(obd_mount.c:1368:server_put_super()) no obd > testfs-OSTffff > LustreError: 2858:0:(obd_mount.c:119:server_deregister_mount()) > testfs-OSTffff not registered > > LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) > LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 > breaks, 0 lost > LDISKFS-fs: mballoc: 0 generated and it took 0 > LDISKFS-fs: mballoc: 0 preallocated, 0 discarded > Lustre: server umount testfs-OSTffff complete > LustreError: 2858:0:(obd_mount.c:1924:lustre_fill_super()) Unable to > mount (-5) > >>> Machines can comunicate and MGS is probably running. >>> It is Lustre 1.6.4.2 and kernel 2.6.18 >>> Have you any ideas where could be a problem? >>> _______________________________________________ >>> Lustre-discuss mailing list >>> Lustre-discuss at lists.lustre.org >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss at lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discussAaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 aaron at iges.org
Ping to "loopback" is ok: ping 192.168.2.54 12345-0 at lo 12345-192.168.2.54 at tcp Ping to other machines: failed to ping 192.168.2.98 at tcp: Input/output error And dk log after ping: 00000400:00000100:0:1204089847.556806:0:3163:0:(linux-tcpip.c:669:libcfs_sock_connect()) Error -113 connecting 0.0.0.0/1023 -> 192.168.2.98/988 00000400:00000100:0:1204089847.556864:0:3163:0:(acceptor.c:81:lnet_connect_console_error()) Connection to 192.168.2.98 at tcp at host 192.168.2.98 was unreachable: the network or that node may be down, or Lustre may be misconfigured. 00000800:00000100:0:1204089847.556905:0:3163:0:(socklnd_cb.c:417:ksocknal_txlist_done()) Deleting packet type 2 len 0 192.168.2.54 at tcp->192.168.2.98 at tcp Aaron Knister napsal(a):> When you say they can communicate, did you try lctl ping? > > On Feb 21, 2008, at 4:57 AM, Tomec Martin wrote: > >> >> >> Isaac Huang napsal(a): >>> On Wed, Feb 20, 2008 at 11:19:06PM +0100, Tomec Martin wrote: >>>> According to tutorial I started MGS and MDT on first machine (with IP >>>> 192.168.2.55): >>>> mkfs.lustre --mgs /dev/sdb1 >>>> mkdir -p /mnt/mgs >>>> mount -t lustre /dev/sdb1 /mnt/mgs >>>> mkfs.lustre --fsname=testfs --mdt --mgsnode=192.168.2.55 at tcp0 /dev/sdb2 >>>> mkdir -p /mnt/test/mdt >>>> mount -t lustre /dev/sdb2 /mnt/test/mdt >>>> >>>> On second machine I tried start OST with: >>>> mkfs.lustre --fsname=testfs --ost --mgsnode=192.168.2.55 at tcp0 /dev/sdb >>>> mkdir -p /mnt/test/ost0 >>>> mount -t lustre /dev/sdb /mnt/test/ost0 >>>> >>>> but got this error: >>>> mount.lustre: mount /dev/sdb at /mnt/test/ost0 failed: Input/output >>>> error >>>> Is the MGS running? >>> >>> Was there any error messages in ''dmesg'' on the node? >>> >>> Isaac >>> >> >> Yes, log is below. Maybe it can be some incompatibility with Centos 5 (I >> used packages for Red Hat 5) >> >> Lustre: OBD class driver, info at clusterfs.com >> Lustre Version: 1.6.4.2 >> Build Version: >> 1.6.4.2-19691231190000-PRISTINE-.cache.build.BUILD.lustre-kernel-2.6.18.lustre.linux-2.6.18-8.1.14.el5_lustre.1.6.4.2smp >> >> Lustre: Added LNI 192.168.2.56 at tcp [8/256] >> Lustre: Accept secure, port 988 >> Lustre: Lustre Client File System; info at clusterfs.com >> kjournald starting. Commit interval 5 seconds >> LDISKFS FS on sdb, internal journal >> LDISKFS-fs: mounted filesystem with ordered data mode. >> SELinux: initialized (dev sdb, type ldiskfs), not configured for labeling >> kjournald starting. Commit interval 5 seconds >> LDISKFS FS on sdb, internal journal >> LDISKFS-fs: mounted filesystem with ordered data mode. >> LDISKFS-fs: file extents enabled >> LDISKFS-fs: mballoc enabled >> SELinux: initialized (dev sdb, type ldiskfs), not configured for labeling >> >> LustreError: 2891:0:(client.c:975:ptlrpc_expire_one_request()) @@@ >> timeout (sent at 1203589568, 5s ago) req at cbaa7600 x1/t0 >> o250->MGS at MGC192.168.2.55@tcp_0:26 lens 240/272 ref 1 fl Rpc:/0/0 rc >> 0/-22 >> LustreError: 2858:0:(obd_mount.c:954:server_register_target()) >> registration with the MGS failed (-5) >> LustreError: 2858:0:(obd_mount.c:1054:server_start_targets()) Required >> registration failed for testfs-OSTffff: -5 >> LustreError: 15f-b: Communication error with the MGS. Is the MGS >> running? >> LustreError: 2858:0:(obd_mount.c:1570:server_fill_super()) Unable to >> start targets: -5 >> LustreError: 2858:0:(obd_mount.c:1368:server_put_super()) no obd >> testfs-OSTffff >> LustreError: 2858:0:(obd_mount.c:119:server_deregister_mount()) >> testfs-OSTffff not registered >> >> LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) >> LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 >> breaks, 0 lost >> LDISKFS-fs: mballoc: 0 generated and it took 0 >> LDISKFS-fs: mballoc: 0 preallocated, 0 discarded >> Lustre: server umount testfs-OSTffff complete >> LustreError: 2858:0:(obd_mount.c:1924:lustre_fill_super()) Unable to >> mount (-5) >> >>>> Machines can comunicate and MGS is probably running. >>>> It is Lustre 1.6.4.2 and kernel 2.6.18 >>>> Have you any ideas where could be a problem? >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> Lustre-discuss at lists.lustre.org >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >> _______________________________________________ >> Lustre-discuss mailing list >> Lustre-discuss at lists.lustre.org >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > Aaron Knister > Associate Systems Analyst > Center for Ocean-Land-Atmosphere Studies > > (301) 595-7000 > aaron at iges.org > > > > >