Wojciech Turek
2010-Jan-05 17:57 UTC
[Lustre-discuss] The client profile could not be read from the MGS
Hello everyone and Happy New Year, On my MDS server I have two file systems work and work2. Yesterday I reconfigured file system named ''work'' and ran writeconf in order to recreate it''s configuration logs. I ran writeconf while other file system work2 was running. Both file systems share the same MGS and I think that writeconf cleared CONFIGS directory on the MGS for both of them. I didn''t see any problems immediately after I run writeconf until I unmounted work2 from one of the client servers. When I tried to mount it back this message appeared: mount.lustre: mount 10.44.245.203 at tcp:/work2 at /scratch2 failed: Invalid argument This may have multiple causes. Is ''work2'' the correct filesystem name? Are the mount options correct? Check the syslog for more info. And the syslog on the clients says: Jan 5 17:15:47 node-h01 kernel: LustreError: 156-2: The client profile ''work2-client'' could not be read from the MGS. Does that filesystem exist? Jan 5 17:15:47 node-h01 kernel: LustreError: 7936:0:(ldlm_request.c:996:ldlm_cli_cancel_req()) Got rc -108 from cancel RPC: canceling anyway Jan 5 17:15:47 node-h01 kernel: LustreError: 7936:0:(ldlm_request.c:1605:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -108 Jan 5 17:15:47 node-h01 kernel: Lustre: client ffff81016d4dd000 umount complete Jan 5 17:15:47 node-h01 kernel: LustreError: 7936:0:(obd_mount.c:1980:lustre_fill_super()) Unable to mount (-22) I have done some searching and I found one similar problem reported on this mailing list. the suggestion was to check the CONFIGS dir if the client profile file exists. On my MDS node I ran this command: debugfs -c -R ''ls -l CONFIGS'' /dev/drbd_mds03_vg/mgs_lv debugfs 1.40.7.sun3 (28-Feb-2008) /dev/drbd_mds03_vg/mgs_lv: catastrophic mode - not reading inode or group bitmaps 303105 40777 (2) 0 0 4096 4-Jan-2010 11:39 . 2 40755 (2) 0 0 4096 22-May-2009 10:59 .. 303106 100644 (1) 0 0 12288 22-May-2009 10:59 mountdata 303107 100644 (1) 0 0 28704 4-Jan-2010 05:15 work-client 303108 100644 (1) 0 0 27936 4-Jan-2010 05:15 work-MDT0000 303109 100644 (1) 0 0 8880 4-Jan-2010 05:16 work-OST0000 303110 100644 (1) 0 0 8880 4-Jan-2010 05:16 work-OST0001 303111 100644 (1) 0 0 8880 4-Jan-2010 05:17 work-OST0002 303112 100644 (1) 0 0 8880 4-Jan-2010 05:17 work-OST0003 303113 100644 (1) 0 0 8880 4-Jan-2010 05:18 work-OST0004 303114 100644 (1) 0 0 8880 4-Jan-2010 05:21 work-OST0005 303115 100644 (1) 0 0 8880 4-Jan-2010 05:21 work-OST0006 303116 100644 (1) 0 0 8880 4-Jan-2010 05:21 work-OST0007 303117 100644 (1) 0 0 8880 4-Jan-2010 05:22 work-OST0008 303118 100644 (1) 0 0 8880 4-Jan-2010 05:23 work-OST0009 303119 100644 (1) 0 0 8880 4-Jan-2010 05:23 work-OST000a 303120 100644 (1) 0 0 8880 4-Jan-2010 05:23 work-OST000b 303121 100644 (1) 0 0 0 4-Jan-2010 11:39 work2-client work2-client file is zero size and all the OST and MDT files for work2 file system are missing. Is there a way to recover this files without stopping work2 file system? If I umount all work2 OSTs and MDT and then run writeconf on them and mount them back, would this recreate this missing files? Also can do above without umounting clients (let them wait until lustre targets come back) and would this kill any jobs running one them? Many thanks for your input Cheers Wojciech -- -- Wojciech Turek Assistant System Manager High Performance Computing Service University of Cambridge Email: wjt27 at cam.ac.uk Tel: (+)44 1223 763517
Andreas Dilger
2010-Jan-05 21:00 UTC
[Lustre-discuss] The client profile could not be read from the MGS
On 2010-01-05, at 10:57, Wojciech Turek wrote:> On my MDS node I ran this command: > debugfs -c -R ''ls -l CONFIGS'' /dev/drbd_mds03_vg/mgs_lv > 303121 100644 (1) 0 0 0 4-Jan-2010 11:39 work2- > client > > work2-client file is zero size and all the OST and MDT files for work2 > file system are missing. > > Is there a way to recover this files without stopping work2 file > system? > > If I umount all work2 OSTs and MDT and then run writeconf on them and > mount them back, would this recreate this missing files?I suspect yes, though I''m not really the expert in the config code. Could you please file a bug with details. It doesn''t make sense to delete both configs if only rewriting one of them. It would also be useful for such cases to create a backup of the config and leave it on the MGS before deleting it.> Also can do above without umounting clients (let them wait until > lustre targets come back) and would this kill any jobs running one > them?It shouldn''t, but I''m not totally sure what they will do with the new configuration itself. You will likely have to remount the clients at some point before you make any changes to the configuration in the future (e.g. adding an OST or setting tunables) as the currently- mounted clients will likely not detect these due to the new configration that was created. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.