aberoham at gmail.com
2008-Feb-22 00:47 UTC
[Lustre-discuss] LustreError: 15f-b, initial OST mount fails with "Input/output error"
I have a lustre MGS/MDT hosting one lustre filesystem with three OSTs attached. When trying to attach a forth OST I see the following on the OST''s console and the mount command times out with "mount.lustre: mount /dev/lustre2/ost at /mnt/data/ost failed: Input/output error Is the MGS running?". I am able to mount the lustre filesystem on this un-attachable OST node as a client and am able to ping the MGS/MDT and vice versa. # mkfs.lustre --reformat --fsname tmonster --ost --mgsnode=tm01 at tcp0--mkfsoptions=''-N 1200000'' /dev/lustre2/ost ... # mount -t lustre /dev/lustre2/ost /mnt/data/ost mount.lustre: mount /dev/lustre2/ost at /mnt/data/ost failed: Input/output error Is the MGS running? # dmesg Lustre: OBD class driver, info at clusterfs.com Lustre Version: 1.6.4.2 Build Version: 1.6.4.2-19691231190000-PRISTINE-.cache.build.BUILD.lustre-kernel-2.6.18.lustre.linux-2.6.18-8.1.14.el5_lustre.1.6.4.2smp Lustre: Added LNI 192.168.33.5 at tcp [8/256] Lustre: Accept secure, port 988 Lustre: Lustre Client File System; info at clusterfs.com kjournald starting. Commit interval 5 seconds LDISKFS FS on dm-2, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. LustreError: 2535:0:(obd_mount.c:247:ldd_parse()) cannot open CONFIGS/mountdata: rc = -2 LustreError: 2535:0:(obd_mount.c:1252:server_kernel_mount()) premount parse options failed: rc = -2 LustreError: 2535:0:(obd_mount.c:1533:server_fill_super()) Unable to mount device /dev/lustre2/ost: -2 LustreError: 2535:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount (-2) NET: Registered protocol family 10 lo: Disabled Privacy Extensions IPv6 over IPv4 tunneling driver bond0: no IPv6 routers present eth0: no IPv6 routers present eth1: no IPv6 routers present kjournald starting. Commit interval 5 seconds LDISKFS FS on dm-2, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS FS on dm-2, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds LDISKFS FS on dm-2, internal journal LDISKFS-fs: mounted filesystem with ordered data mode. LDISKFS-fs: file extents enabled LDISKFS-fs: mballoc enabled LustreError: 3157:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout (sent at 1203640178, 100s ago) req at ffff81007f5db600 x2/t0 o253-> MGS at MGC192.168.33.1@tcp_0:26 lens 4672/4672 ref 1 fl Rpc:/0/0 rc 0/-22 LustreError: 166-1: MGC192.168.33.1 at tcp: Connection to service MGS via nid 192.168.33.1 at tcp was lost; in progress operations using this service will fail. LustreError: 3157:0:(obd_mount.c:954:server_register_target()) registration with the MGS failed (-5) LustreError: 3157:0:(obd_mount.c:1054:server_start_targets()) Required registration failed for tmonster-OSTffff: -5 LustreError: 15f-b: Communication error with the MGS. Is the MGS running? Lustre: MGC192.168.33.1 at tcp: Reactivating import LustreError: 3157:0:(obd_mount.c:1570:server_fill_super()) Unable to start targets: -5 Lustre: MGC192.168.33.1 at tcp: Connection restored to service MGS using nid 192.168.33.1 at tcp. LustreError: 3157:0:(obd_mount.c:1368:server_put_super()) no obd tmonster-OSTffff LustreError: 3157:0:(obd_mount.c:119:server_deregister_mount()) tmonster-OSTffff not registered LustreError: 11-0: an error occurred while communicating with 192.168.33.1 at tcp. The mgs_disconnect operation failed with -107 LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, 0 lost LDISKFS-fs: mballoc: 0 generated and it took 0 LDISKFS-fs: mballoc: 0 preallocated, 0 discarded Lustre: server umount tmonster-OSTffff complete LustreError: 3157:0:(obd_mount.c:1924:lustre_fill_super()) Unable to mount (-5) MGS/MDT dmesg: (some of these are certainly unrelated to the OST''s mount cmd) Lustre: 2807:0:(ldlm_lib.c:519:target_handle_reconnect()) tmonster-MDT0000: 3d7d98f5-470c-4188-8023-6c0023150148 reconnecting Lustre: 2807:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 2 previous similar messages Lustre: 2806:0:(mds_reint.c:362:mds_steal_ack_locks()) Stealing 1 locks from rs ffff81006ce30000 x895702.t458401 o101 NID 192.168.19.14 at tcp Lustre: 2813:0:(service.c:751:ptlrpc_server_handle_reply()) All locks stolen from rs ffff81006ce30000 x895702.t458401 o101 NID 192.168.19.14 at tcp Lustre: 2797:0:(mds_reint.c:362:mds_steal_ack_locks()) Stealing 1 locks from rs ffff81007f93d000 x666557.t458402 o101 NID 192.168.19.15 at tcp Lustre: 2817:0:(service.c:751:ptlrpc_server_handle_reply()) All locks stolen from rs ffff81007f93d000 x666557.t458402 o101 NID 192.168.19.15 at tcp Lustre: 2688:0:(router.c:167:lnet_notify()) Ignoring prediction from 192.168.33.1 at tcp of 192.168.33.5 at tcp down 7854805405 seconds in the future Lustre: 2780:0:(ldlm_lib.c:519:target_handle_reconnect()) MGS: 8eba281a-43bd-3fa2-2491-fbab892dc02c reconnecting Lustre: 2780:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 2 previous similar messages Lustre: MGS: haven''t heard from client 8eba281a-43bd-3fa2-2491-fbab892dc02c (at 192.168.33.5 at tcp) in 72 seconds. I think it''s dead, and I am evicting it. LustreError: 2780:0:(mgs_handler.c:515:mgs_handle()) lustre_mgs: operation 251 on unconnected MGS LustreError: 2780:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error (-107) req at ffff810073150050 x7/t0 o251-><?>@<?>:-1 lens 128/0 ref 0 fl Interpret:/0/0 rc -107/0 LustreError: 2780:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 2 previous similar messages Mounting the desired lustre filesystem as client on the OST that is having problems -- # mount -t lustre tm01 at tcp0:/tmonster /mnt/tmonster # df -h /mnt/tmonster Filesystem Size Used Avail Use% Mounted on tm01 at tcp0:/tmonster 2.8T 181G 2.6T 7% /mnt/tmonster I have replaced this OSTs hardware (utilizing same boot/OST disks in different blade) to no avail. Any help is highly appreciated. Thanks, Abe -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080221/87bce441/attachment-0002.html
aberoham at gmail.com
2008-Mar-08 07:55 UTC
[Lustre-discuss] LustreError: 15f-b, initial OST mount fails with "Input/output error"
Following up on the below, the problem was a broken ethernet configuration. One eth interface had an MTU of 1500 and the other was set to 9000 or similar silliness, I think. Pings made it through the bond0 balance-alb pair, but TCP/IP didn''t? It was something along those lines. On Thu, Feb 21, 2008 at 4:47 PM, <aberoham at gmail.com> wrote:> > I have a lustre MGS/MDT hosting one lustre filesystem with three OSTs > attached. When trying to attach a forth OST I see the following on the OST''s > console and the mount command times out with "mount.lustre: mount > /dev/lustre2/ost at /mnt/data/ost failed: Input/output error Is the MGS > running?". > > I am able to mount the lustre filesystem on this un-attachable OST node as > a client and am able to ping the MGS/MDT and vice versa. > > # mkfs.lustre --reformat --fsname tmonster --ost --mgsnode=tm01 at tcp0--mkfsoptions=''-N 1200000'' /dev/lustre2/ost > ... > # mount -t lustre /dev/lustre2/ost /mnt/data/ost > mount.lustre: mount /dev/lustre2/ost at /mnt/data/ost failed: Input/output > error > Is the MGS running? > > # dmesg > Lustre: OBD class driver, info at clusterfs.com > Lustre Version: 1.6.4.2 > Build Version: > 1.6.4.2-19691231190000-PRISTINE-.cache.build.BUILD.lustre-kernel-2.6.18.lustre.linux-2.6.18-8.1.14.el5_lustre.1.6.4.2smp > Lustre: Added LNI 192.168.33.5 at tcp [8/256] > Lustre: Accept secure, port 988 > Lustre: Lustre Client File System; info at clusterfs.com > kjournald starting. Commit interval 5 seconds > LDISKFS FS on dm-2, internal journal > LDISKFS-fs: mounted filesystem with ordered data mode. > LustreError: 2535:0:(obd_mount.c:247:ldd_parse()) cannot open > CONFIGS/mountdata: rc = -2 > LustreError: 2535:0:(obd_mount.c:1252:server_kernel_mount()) premount > parse options failed: rc = -2 > LustreError: 2535:0:(obd_mount.c:1533:server_fill_super()) Unable to mount > device /dev/lustre2/ost: -2 > LustreError: 2535:0:(obd_mount.c:1924:lustre_fill_super()) Unable to > mount (-2) > NET: Registered protocol family 10 > lo: Disabled Privacy Extensions > IPv6 over IPv4 tunneling driver > bond0: no IPv6 routers present > eth0: no IPv6 routers present > eth1: no IPv6 routers present > kjournald starting. Commit interval 5 seconds > LDISKFS FS on dm-2, internal journal > LDISKFS-fs: mounted filesystem with ordered data mode. > kjournald starting. Commit interval 5 seconds > LDISKFS FS on dm-2, internal journal > LDISKFS-fs: mounted filesystem with ordered data mode. > kjournald starting. Commit interval 5 seconds > LDISKFS FS on dm-2, internal journal > LDISKFS-fs: mounted filesystem with ordered data mode. > LDISKFS-fs: file extents enabled > LDISKFS-fs: mballoc enabled > LustreError: 3157:0:(client.c:975:ptlrpc_expire_one_request()) @@@ timeout > (sent at 1203640178, 100s ago) req at ffff81007f5db600 x2/t0 o253-> > MGS at MGC192.168.33.1@tcp_0:26 lens 4672/4672 ref 1 fl Rpc:/0/0 rc 0/-22 > LustreError: 166-1: MGC192.168.33.1 at tcp: Connection to service MGS via nid > 192.168.33.1 at tcp was lost; in progress operations using this service will > fail. > LustreError: 3157:0:(obd_mount.c:954:server_register_target()) > registration with the MGS failed (-5) > LustreError: 3157:0:(obd_mount.c:1054:server_start_targets()) Required > registration failed for tmonster-OSTffff: -5 > LustreError: 15f-b: Communication error with the MGS. Is the MGS running? > Lustre: MGC192.168.33.1 at tcp: Reactivating import > LustreError: 3157:0:(obd_mount.c:1570:server_fill_super()) Unable to start > targets: -5 > Lustre: MGC192.168.33.1 at tcp: Connection restored to service MGS using nid > 192.168.33.1 at tcp. > LustreError: 3157:0:(obd_mount.c:1368:server_put_super()) no obd > tmonster-OSTffff > LustreError: 3157:0:(obd_mount.c:119:server_deregister_mount()) > tmonster-OSTffff not registered > LustreError: 11-0: an error occurred while communicating with > 192.168.33.1 at tcp. The mgs_disconnect operation failed with -107 > LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 success) > LDISKFS-fs: mballoc: 0 extents scanned, 0 goal hits, 0 2^N hits, 0 breaks, > 0 lost > LDISKFS-fs: mballoc: 0 generated and it took 0 > LDISKFS-fs: mballoc: 0 preallocated, 0 discarded > Lustre: server umount tmonster-OSTffff complete > LustreError: 3157:0:(obd_mount.c:1924:lustre_fill_super()) Unable to > mount (-5) > > > MGS/MDT dmesg: (some of these are certainly unrelated to the OST''s mount > cmd) > > Lustre: 2807:0:(ldlm_lib.c:519:target_handle_reconnect()) > tmonster-MDT0000: 3d7d98f5-470c-4188-8023-6c0023150148 reconnecting > Lustre: 2807:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 2 > previous similar messages > Lustre: 2806:0:(mds_reint.c:362:mds_steal_ack_locks()) Stealing 1 locks > from rs ffff81006ce30000 x895702.t458401 o101 NID 192.168.19.14 at tcp > Lustre: 2813:0:(service.c:751:ptlrpc_server_handle_reply()) All locks > stolen from rs ffff81006ce30000 x895702.t458401 o101 NID 192.168.19.14 at tcp > Lustre: 2797:0:(mds_reint.c:362:mds_steal_ack_locks()) Stealing 1 locks > from rs ffff81007f93d000 x666557.t458402 o101 NID 192.168.19.15 at tcp > Lustre: 2817:0:(service.c:751:ptlrpc_server_handle_reply()) All locks > stolen from rs ffff81007f93d000 x666557.t458402 o101 NID 192.168.19.15 at tcp > Lustre: 2688:0:(router.c:167:lnet_notify()) Ignoring prediction from > 192.168.33.1 at tcp of 192.168.33.5 at tcp down 7854805405 seconds in the future > Lustre: 2780:0:(ldlm_lib.c:519:target_handle_reconnect()) MGS: > 8eba281a-43bd-3fa2-2491-fbab892dc02c reconnecting > Lustre: 2780:0:(ldlm_lib.c:519:target_handle_reconnect()) Skipped 2 > previous similar messages > Lustre: MGS: haven''t heard from client > 8eba281a-43bd-3fa2-2491-fbab892dc02c (at 192.168.33.5 at tcp) in 72 seconds. > I think it''s dead, and I am evicting it. > LustreError: 2780:0:(mgs_handler.c:515:mgs_handle()) lustre_mgs: operation > 251 on unconnected MGS > LustreError: 2780:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ > processing error (-107) req at ffff810073150050 x7/t0 o251-><?>@<?>:-1 lens > 128/0 ref 0 fl Interpret:/0/0 rc -107/0 > LustreError: 2780:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 2 > previous similar messages > > > Mounting the desired lustre filesystem as client on the OST that is having > problems -- > > # mount -t lustre tm01 at tcp0:/tmonster /mnt/tmonster > # df -h /mnt/tmonster > Filesystem Size Used Avail Use% Mounted on > tm01 at tcp0:/tmonster 2.8T 181G 2.6T 7% /mnt/tmonster > > I have replaced this OSTs hardware (utilizing same boot/OST disks in > different blade) to no avail. Any help is highly appreciated. > > Thanks, > Abe > >-------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080307/4c4d6a21/attachment-0002.html