Hi, After my MDS crashed I was unable to mount the mdt/mgs. The dmesg output is below. I''m unable to remove lustre modules (lustre_rmmod) and it''s listed under /proc/fs/lustre/devices but not mounted. Rebooting the system to try again results in a kernel panic. Upon reset I ran fsck which revealed no problems so I tried a --writeconf and deleted CATALOGS but still received -17 and was unable to reboot clean or unload modules. Fortunately this is my test system but I''d like to understand what happened! Running Lustre 1.8.5 on RHEL 5.5. cat /proc/fs/lustre/devices 7 AT osc test-OST0000-osc test-mdtlov_UUID 1 Lustre: MGS MGS started Lustre: MGC192.168.5.100 at o2ib: Reactivating import Lustre: MGC192.168.5.100 at o2ib: Reactivating import Lustre: Enabling user_xattr Lustre: test-MDT0000: Now serving test-MDT0000 on /dev/sda1 with recovery enabled Lustre: 5590:0:(lproc_mds.c:271:lprocfs_wr_group_upcall()) test-MDT0000: group upcall set to /usr/sbin/l_getgroups Lustre: test-MDT0000.mdt: set parameter group_upcall=/usr/sbin/l_getgroups LustreError: 5590:0:(ldlm_lib.c:331:client_obd_setup()) can''t add initial connection LustreError: 5590:0:(obd_config.c:372:class_setup()) setup test-OST0000-osc failed (-2) LustreError: 5590:0:(obd_config.c:1199:class_config_llog_handler()) Err -2 on cfg command: Lustre: cmd=cf003 0:test-OST0000-osc 1:test-OST0000_UUID 2:128.174.5.100 at tcp LustreError: 15c-8: MGC192.168.5.100 at o2ib: The configuration from log ''test-MDT0000'' failed (-2). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. LustreError: 5453:0:(obd_mount.c:1126:server_start_targets()) failed to start server test-MDT0000: -2 LustreError: 5453:0:(obd_mount.c:1655:server_fill_super()) Unable to start targets: -2 Lustre: Failing over test-MDT0000 Lustre: Failing over test-mdtlov Lustre: test-MDT0000: shutting down for failover; client state will be preserved. Lustre: MDT test-MDT0000 has stopped. Lustre: MGS has stopped. Lustre: server umount test-MDT0000 complete LustreError: 5453:0:(obd_mount.c:2050:lustre_fill_super()) Unable to mount (-2) Thanks, Dan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20110526/f1892cdc/attachment.html
On Thu, May 26, 2011 at 10:27:26AM -0700, Dan wrote:> Lustre: MGS MGS started > Lustre: MGC192.168.5.100 at o2ib: Reactivating import > Lustre: MGC192.168.5.100 at o2ib: Reactivating importSo you use infiniband ... [...]> LustreError: 5590:0:(ldlm_lib.c:331:client_obd_setup()) can''t add > initial connection > LustreError: 5590:0:(obd_config.c:372:class_setup()) setup > test-OST0000-osc failed (-2) > LustreError: 5590:0:(obd_config.c:1199:class_config_llog_handler()) Err > -2 on cfg command: > Lustre: cmd=cf003 0:test-OST0000-osc 1:test-OST0000_UUID > 2:128.174.5.100 at tcpBut a tcp nid is registered for OST0000. Is this intended? If so, have you configured lnet on the MDS to use tcp? Cheers, Johann -- Johann Lombardi Whamcloud, Inc. www.whamcloud.com