Wojciech Turek
2010-Jan-05 17:57 UTC
[Lustre-discuss] The client profile could not be read from the MGS
Hello everyone and Happy New Year,
On my MDS server I have two file systems work and work2. Yesterday I
reconfigured file system named ''work'' and ran writeconf in
order to
recreate it''s configuration logs. I ran writeconf while other file
system work2 was running. Both file systems share the same MGS and I
think that writeconf cleared CONFIGS directory on the MGS for both of
them. I didn''t see any problems immediately after I run writeconf
until I unmounted work2 from one of the client servers. When I tried
to mount it back this message appeared:
mount.lustre: mount 10.44.245.203 at tcp:/work2 at /scratch2 failed:
Invalid argument
This may have multiple causes.
Is ''work2'' the correct filesystem name?
Are the mount options correct?
Check the syslog for more info.
And the syslog on the clients says:
Jan 5 17:15:47 node-h01 kernel: LustreError: 156-2: The client
profile ''work2-client'' could not be read from the MGS. Does
that
filesystem exist?
Jan 5 17:15:47 node-h01 kernel: LustreError:
7936:0:(ldlm_request.c:996:ldlm_cli_cancel_req()) Got rc -108 from
cancel RPC: canceling anyway
Jan 5 17:15:47 node-h01 kernel: LustreError:
7936:0:(ldlm_request.c:1605:ldlm_cli_cancel_list())
ldlm_cli_cancel_list: -108
Jan 5 17:15:47 node-h01 kernel: Lustre: client ffff81016d4dd000 umount complete
Jan 5 17:15:47 node-h01 kernel: LustreError:
7936:0:(obd_mount.c:1980:lustre_fill_super()) Unable to mount (-22)
I have done some searching and I found one similar problem reported on
this mailing list.
the suggestion was to check the CONFIGS dir if the client profile file exists.
On my MDS node I ran this command:
debugfs -c -R ''ls -l CONFIGS'' /dev/drbd_mds03_vg/mgs_lv
debugfs 1.40.7.sun3 (28-Feb-2008)
/dev/drbd_mds03_vg/mgs_lv: catastrophic mode - not reading inode or
group bitmaps
303105 40777 (2) 0 0 4096 4-Jan-2010 11:39 .
2 40755 (2) 0 0 4096 22-May-2009 10:59 ..
303106 100644 (1) 0 0 12288 22-May-2009 10:59 mountdata
303107 100644 (1) 0 0 28704 4-Jan-2010 05:15 work-client
303108 100644 (1) 0 0 27936 4-Jan-2010 05:15 work-MDT0000
303109 100644 (1) 0 0 8880 4-Jan-2010 05:16 work-OST0000
303110 100644 (1) 0 0 8880 4-Jan-2010 05:16 work-OST0001
303111 100644 (1) 0 0 8880 4-Jan-2010 05:17 work-OST0002
303112 100644 (1) 0 0 8880 4-Jan-2010 05:17 work-OST0003
303113 100644 (1) 0 0 8880 4-Jan-2010 05:18 work-OST0004
303114 100644 (1) 0 0 8880 4-Jan-2010 05:21 work-OST0005
303115 100644 (1) 0 0 8880 4-Jan-2010 05:21 work-OST0006
303116 100644 (1) 0 0 8880 4-Jan-2010 05:21 work-OST0007
303117 100644 (1) 0 0 8880 4-Jan-2010 05:22 work-OST0008
303118 100644 (1) 0 0 8880 4-Jan-2010 05:23 work-OST0009
303119 100644 (1) 0 0 8880 4-Jan-2010 05:23 work-OST000a
303120 100644 (1) 0 0 8880 4-Jan-2010 05:23 work-OST000b
303121 100644 (1) 0 0 0 4-Jan-2010 11:39 work2-client
work2-client file is zero size and all the OST and MDT files for work2
file system are missing.
Is there a way to recover this files without stopping work2 file system?
If I umount all work2 OSTs and MDT and then run writeconf on them and
mount them back, would this recreate this missing files?
Also can do above without umounting clients (let them wait until
lustre targets come back) and would this kill any jobs running one
them?
Many thanks for your input
Cheers
Wojciech
--
--
Wojciech Turek
Assistant System Manager
High Performance Computing Service
University of Cambridge
Email: wjt27 at cam.ac.uk
Tel: (+)44 1223 763517
Andreas Dilger
2010-Jan-05 21:00 UTC
[Lustre-discuss] The client profile could not be read from the MGS
On 2010-01-05, at 10:57, Wojciech Turek wrote:> On my MDS node I ran this command: > debugfs -c -R ''ls -l CONFIGS'' /dev/drbd_mds03_vg/mgs_lv > 303121 100644 (1) 0 0 0 4-Jan-2010 11:39 work2- > client > > work2-client file is zero size and all the OST and MDT files for work2 > file system are missing. > > Is there a way to recover this files without stopping work2 file > system? > > If I umount all work2 OSTs and MDT and then run writeconf on them and > mount them back, would this recreate this missing files?I suspect yes, though I''m not really the expert in the config code. Could you please file a bug with details. It doesn''t make sense to delete both configs if only rewriting one of them. It would also be useful for such cases to create a backup of the config and leave it on the MGS before deleting it.> Also can do above without umounting clients (let them wait until > lustre targets come back) and would this kill any jobs running one > them?It shouldn''t, but I''m not totally sure what they will do with the new configuration itself. You will likely have to remount the clients at some point before you make any changes to the configuration in the future (e.g. adding an OST or setting tunables) as the currently- mounted clients will likely not detect these due to the new configration that was created. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.