Andreas Dilger
2006-May-19 07:36 UTC
[Lustre-discuss] Lustre 1.4.3 - has a startup problem on mds.
On Jul 23, 2005 15:53 +0900, Jongmin, Lee wrote:> I installed lustre 1.4.3 and it consist of one MDS, two OSTs. > OSTs started well. but, MDS had a message like below and then suspended > processing. > > add_uuid NID_192.168.0.122_UUID 192.168.0.122 tcp <- suspended.Could you please include output from dmesg or /var/log/messages from MDS startup. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Jongmin, Lee
2006-May-19 07:36 UTC
[Lustre-discuss] Lustre 1.4.3 - has a startup problem on mds.
Hello, I installed lustre 1.4.3 and it consist of one MDS, two OSTs. Configuration likes below. --add net --node mds --nid mds --nettype tcp --add net --node ost1 --nid ost1 --nettype tcp --add net --node ost2 --nid ost2 --nettype tcp --add net --node client --nid * --nettype tcp --add mds --node mds --mds mds1 --fstype ldiskfs --dev /dev/hda5 --add lov --lov lov1 --mds mds1 --stripe_sz 262144 --stripe_cnt 1 --stripe_pattern 0 --add ost --node ost1 --lov lov1 --ost ost1 --fstype ldiskfs --dev /dev/hda2 --add ost --node ost2 --lov lov1 --ost ost2 --fstype ldiskfs --dev /dev/hda2 --add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1 OSTs started well. but, MDS had a message like below and then suspended processing. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++ ... Service: network NET_mds_tcp NET_mds_tcp_UUID NETWORK: NET_mds_tcp NET_mds_tcp_UUID tcp 192.168.0.121 988 + /usr/sbin/lctl network tcp mynid 192.168.0.121 quit Service: ldlm ldlm ldlm_UUID Service: mdsdev MDD_mds1_mds MDD_mds1_mds_UUID stripe_count %d, inode_size %d 1 512 MDSDEV: mds1 mds1_UUID /dev/hda2 ldiskfs no + sfdisk -s /dev/hda2 + mkfs.ext2 -j -b 4096 -F -i 4096 -J size=400 -I 512 /dev/hda2 + tune2fs -O dir_index /dev/hda2 /usr/sbin/lctl attach mds mds1 mds1_UUID quit + /usr/sbin/lctl cfg_device mds1 setup /dev/hda2 ldiskfs quit recording clients for filesystem: FS_fsname_UUID get_lov_tgts failed, using get_refs dbg LOV __init__: [(<__main__.OSC instance at 0xb7884b6c>, 0, 1, 1), (<__main__.OSC instance at 0xb789104c>, 1, 1, 1)] [u''ost1_UUID_2'', u''ost2_UUID_2''] 1 + /usr/sbin/lctl device $mds1 probe clear_log mds1 quit Recording log mds1 on mds1 dbg LOV prepare dbg LOV prepare: [(<__main__.OSC instance at 0xb7884b6c>, 0, 1, 1), (<__main__.OSC instance at 0xb789104c>, 1, 1, 1)] [u''ost1_UUID_2'', u''ost2_UUID_2''] LOV: lov_mds1 94104_lov_mds1_e104bf19b2 mds1_UUID 1 262144 0 0 [u''ost1_UUID_2'', u''ost2_UUID_2''] mds1 + /usr/sbin/lctl device $mds1 record mds1 attach lov lov_mds1 94104_lov_mds1_e104bf19b2 lov_setup lov1_UUID 1 262144 0 0 quit OSC: OSC_mds_ost1_mds1 94104_lov_mds1_e104bf19b2 ost1_UUID_2 + /usr/sbin/lctl device $mds1 record mds1 add_uuid NID_192.168.0.122_UUID 192.168.0.122 tcp <- suspended. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++ What''s wrong? Any help would be appreciated.