Andreas Dilger
2006-May-19 07:36 UTC
[Lustre-discuss] Lustre 1.4.3 - has a startup problem on mds.
On Jul 23, 2005 15:53 +0900, Jongmin, Lee wrote:> I installed lustre 1.4.3 and it consist of one MDS, two OSTs. > OSTs started well. but, MDS had a message like below and then suspended > processing. > > add_uuid NID_192.168.0.122_UUID 192.168.0.122 tcp <- suspended.Could you please include output from dmesg or /var/log/messages from MDS startup. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.
Jongmin, Lee
2006-May-19 07:36 UTC
[Lustre-discuss] Lustre 1.4.3 - has a startup problem on mds.
Hello,
I installed lustre 1.4.3 and it consist of one MDS, two OSTs.
Configuration likes below.
--add net --node mds --nid mds --nettype tcp
--add net --node ost1 --nid ost1 --nettype tcp
--add net --node ost2 --nid ost2 --nettype tcp
--add net --node client --nid * --nettype tcp
--add mds --node mds --mds mds1 --fstype ldiskfs --dev /dev/hda5
--add lov --lov lov1 --mds mds1 --stripe_sz 262144 --stripe_cnt 1
--stripe_pattern 0
--add ost --node ost1 --lov lov1 --ost ost1 --fstype ldiskfs --dev
/dev/hda2
--add ost --node ost2 --lov lov1 --ost ost2 --fstype ldiskfs --dev
/dev/hda2
--add mtpt --node client --path /mnt/lustre --mds mds1 --lov lov1
OSTs started well. but, MDS had a message like below and then suspended
processing.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++
...
Service: network NET_mds_tcp NET_mds_tcp_UUID
NETWORK: NET_mds_tcp NET_mds_tcp_UUID tcp 192.168.0.121 988
+ /usr/sbin/lctl
network tcp
mynid 192.168.0.121
quit
Service: ldlm ldlm ldlm_UUID
Service: mdsdev MDD_mds1_mds MDD_mds1_mds_UUID stripe_count %d, inode_size
%d 1 512
MDSDEV: mds1 mds1_UUID /dev/hda2 ldiskfs no
+ sfdisk -s /dev/hda2
+ mkfs.ext2 -j -b 4096 -F -i 4096 -J size=400 -I 512 /dev/hda2
+ tune2fs -O dir_index /dev/hda2 /usr/sbin/lctl
attach mds mds1 mds1_UUID
quit
+ /usr/sbin/lctl
cfg_device mds1
setup /dev/hda2 ldiskfs
quit
recording clients for filesystem: FS_fsname_UUID get_lov_tgts failed, using
get_refs dbg LOV __init__: [(<__main__.OSC instance at 0xb7884b6c>, 0, 1,
1), (<__main__.OSC instance at 0xb789104c>, 1, 1, 1)]
[u''ost1_UUID_2'',
u''ost2_UUID_2''] 1
+ /usr/sbin/lctl
device $mds1
probe
clear_log mds1
quit
Recording log mds1 on mds1
dbg LOV prepare
dbg LOV prepare: [(<__main__.OSC instance at 0xb7884b6c>, 0, 1, 1),
(<__main__.OSC instance at 0xb789104c>, 1, 1, 1)]
[u''ost1_UUID_2'',
u''ost2_UUID_2'']
LOV: lov_mds1 94104_lov_mds1_e104bf19b2 mds1_UUID 1 262144 0 0
[u''ost1_UUID_2'', u''ost2_UUID_2''] mds1
+ /usr/sbin/lctl
device $mds1
record mds1
attach lov lov_mds1 94104_lov_mds1_e104bf19b2
lov_setup lov1_UUID 1 262144 0 0
quit
OSC: OSC_mds_ost1_mds1 94104_lov_mds1_e104bf19b2 ost1_UUID_2
+ /usr/sbin/lctl
device $mds1
record mds1
add_uuid NID_192.168.0.122_UUID 192.168.0.122 tcp <- suspended.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++
What''s wrong?
Any help would be appreciated.