Hi everyone ! i currently have a serious trouble with OST - MDS connecting. My lustre file system have 1 MDS and 3 OSTs ( each MDS and OST have backup node by synchronize by drdb ). Yesterday, maybe because my partner move CATALOGS file when mount our devices at ldiskfs type, everything goes down, all of our OSTs can''t connect to my MDS. I tried umounting all and remount but it didn''t help. Everything is ok when i mount my disk on MDS and OST, but after recovering, in MDS log, we saw error like this: Sep 26 05:46:51 MDS1 kernel: LustreError: 6161:0:(mds_lov.c:984:__mds_ lov_synchronize()) lustre-OST0003_UUID failed at update_mds: -22 and MDS deactivate our OST, all of our OSTs are in the INACTIVE state with MDS : lctl dl 0 UP mgs MGS MGS 15 1 UP mgc MGC192.168.1.78 at tcp dd7b40bd-ab09-d972-7e3a-fc62205b4968 5 2 UP mdt MDS MDS_uuid 3 3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4 4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 7 5 IN osc lustre-OST0003-osc lustre-mdtlov_UUID 5 6 IN osc lustre-OST0000-osc lustre-mdtlov_UUID 5 7 IN osc lustre-OST0006-osc lustre-mdtlov_UUID 5 8 IN osc lustre-OST0005-osc lustre-mdtlov_UUID 5 9 IN osc lustre-OST0004-osc lustre-mdtlov_UUID 5 Because of rc: -22 report, i tried changing parameters in our OSTs ( in face, i only erase and set it with the old parameter, because it''s work well with this info during 4 months, i don''t think we had a problem in using parameter here ) but i didn''t help and show me an other error: When i mount one of my OST ( both OST and MDS are justified parameter by tunefs.lustre ), i get this: mount.lustre: mount /dev/sdc at /mnt/lustre failed: Input/output error Is the MGS running? OST and MDS completely can connect together, by both ping and lctc ping !!!!! I also mounted my mdt as ldiskfs type and remove CATALOGS and CONFIGS, didn''t help :( As trying in vain, i reformat OST and MDS like this: mkfs.lustre --reformat --verbose --writeconf --ost --mgsnode=192.168.1.78 at tcp:192.168.1.80 at tcp --failover=192.168.1.82 at tcp--index=1 /dev/sdc mkfs.lustre --reformat --mgs --mdt --failover=192.168.1.80 at tcp --writeconf /dev/sda4 After reformat, everything is at the stand still, i still get : Is the MGS running error :( With all of our problems i showed you above, could you please give me and advice or solution ? it''s really really a disaster with me now ? Is there any way to fix the failed at update_mds -22 error ? Is there any way to fix the " is MGS running error ? " I still have all of my data in MGS - backup node ( it have the same problem with MDS1 but didn''t be formatted ), could anyone please show me how to move it safely to my new MDS ? Any help could be highly appreciated :( Hope you can reply us as soon as possible . Many thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.lustre.org/pipermail/lustre-discuss/attachments/20090927/03526df0/attachment.html