Hello, lets assume there had been a severe hardware failure and an entire OST was lost. Now I simply want to recreate this OST - I run mkfs.lustre and tunefs.lustre --index={old index of the lost ost}. Now on mounting this OST mount.lustre and the MGS complain this index is already in use. Nice, but how can I convince it the action I try to do is correct? tunefs.lustre --writeconf /dev/mgsdevice doesn''t help. Any ideas? Thanks, Bernd PS: No real data lost, I''m just to lazy to re-create the entire Lustre filesystem. -- Bernd Schubert Q-Leap Networks GmbH
On Tue, 2008-02-05 at 18:29 +0100, Bernd Schubert wrote:> Hello, > > lets assume there had been a severe hardware failure and an entire OST was > lost. Now I simply want to recreate this OST - I run mkfs.lustre and > tunefs.lustre --index={old index of the lost ost}. > Now on mounting this OST mount.lustre and the MGS complain this index is > already in use. Nice, but how can I convince it the action I try to do is > correct? tunefs.lustre --writeconf /dev/mgsdevice doesn''t help.Put the "--writeconf --index <num>" on the mkfs.lustre command for the OST. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080205/2441b95f/attachment-0002.bin
On Tuesday 05 February 2008 19:30:46 Brian J. Murrell wrote:> On Tue, 2008-02-05 at 18:29 +0100, Bernd Schubert wrote: > > Hello, > > > > lets assume there had been a severe hardware failure and an entire OST > > was lost. Now I simply want to recreate this OST - I run mkfs.lustre and > > tunefs.lustre --index={old index of the lost ost}. > > Now on mounting this OST mount.lustre and the MGS complain this index is > > already in use. Nice, but how can I convince it the action I try to do is > > correct? tunefs.lustre --writeconf /dev/mgsdevice doesn''t help. > > Put the "--writeconf --index <num>" on the mkfs.lustre command for the > OST.Thanks for your help, but that doesn''t help either :( Same error message as before. Cheers, Bernd
On Tue, 2008-02-05 at 21:12 +0100, Bernd Schubert wrote:> > Thanks for your help, but that doesn''t help either :( Same error message as > before.Hrm. Can you post a transcript of your operations? b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080205/8016beb7/attachment-0002.bin
On Tuesday 05 February 2008 21:22:37 Brian J. Murrell wrote:> On Tue, 2008-02-05 at 21:12 +0100, Bernd Schubert wrote: > > Thanks for your help, but that doesn''t help either :( Same error message > > as before. > > Hrm. Can you post a transcript of your operations?1.) mkfs.lustre ... (many parameters) /dev/device 2.) tunefs.lustre --index={failed_index} /dev/device 3.) mount -t lustre /dev/device /mnt/somewhere --> failed: Address already in use 4.) Wrote mail and got your answer 3.) mkfs.lustre --index --writeconf ... (many paramters) /dev/device 4.) mount -t lustre /dev/device /mnt/somewhere --> failed again: Address already in use Now I mounted the mgs as ldiskfs, and in CONFIGS/ there is no file for the missing ost. But now I just found the reason - the failed OST was still activated on the clients. After deleting CONFIGS/{fsname}-client and remounting as type lustre again, registering the failed ost works! I guess one shouldn''t do it this way if one still has important data on the filesystem ;) Thanks a lot for your help, Bernd -- Bernd Schubert Q-Leap Networks GmbH
Hello PH. On Tuesday 05 February 2008 18:40:34 you wrote:> Hi , > I would like to know that as well , just in case :) > P-s : what is the best way to backup a single OST for exactly the situation > you decribed ?I think you have seen how I got it working, but I guess this is not the way one should do it. Probably it would be better to first deactivate this OST on all clients and then to mount the newly created ost with the old index (instead of brute-force deleting the client config file as I did). For OST backup there is as far as I know a clusterfs-modified star (*). Cheers, Bernd PS (*): Btw, did someone already tell Joerg his star was modified? I just would be curious about his reaction ;) (To understand my question you need to know about discussions with Joerg on LKML, Debian and cd/dvd-related mailing lists). -- Bernd Schubert Q-Leap Networks GmbH
On Tue, 2008-02-05 at 21:45 +0100, Bernd Schubert wrote:> > > 1.) mkfs.lustre ... (many parameters) /dev/device > 2.) tunefs.lustre --index={failed_index} /dev/device > > 3.) mount -t lustre /dev/device /mnt/somewhere > > --> failed: Address already in use > > 4.) Wrote mail and got your answer > > 3.) mkfs.lustre --index --writeconf ... (many paramters) /dev/device^ You need to specify which index it should be (a numeric value).> Now I mounted the mgs as ldiskfs, and in CONFIGS/ there is no file for the > missing ost. > But now I just found the reason - the failed OST was still activated on the > clients. After deleting CONFIGS/{fsname}-client and remounting as type lustre > again, registering the failed ost works! > I guess one shouldn''t do it this way if one still has important data on the > filesystem ;)Seems like you worked around it in any case. :-)> Thanks a lot for your help,NP. Just sorry my explanation was not more clear. b. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080207/ef72fe8c/attachment-0002.bin
Bill Wichser
2008-Feb-22 13:47 UTC
[Lustre-discuss] Lustre and MPI-ROMIO, specifically ADIOI_Set_Lock and noncontiguous writing
I''m trying to do noncontiguous writes on a Lustre filesystem using MPI-ROMIO routines. It is failing in ADIOI_Set_Lock with the message: File locking failed in ADIOI_Set_lock. If the file system is NFS, you need to use NFS version 3, ensure that the lockd daemon is running on all the machines, and mount the directory with the ''noac'' option (no attribute caching). Now I''ve searched the web for some solution but only seem to come across similar problems. Is there some way to mount a lustre filesystem so that it somehow does the right thing here and use the ufs kernel interface? NFS mounted filesystems work fine as they provide the file locking mechanism apparently. Thanks, Bill
Andreas Dilger
2008-Feb-25 04:18 UTC
[Lustre-discuss] Lustre and MPI-ROMIO, specifically ADIOI_Set_Lock and noncontiguous writing
On Feb 22, 2008 08:47 -0500, Bill Wichser wrote:> I''m trying to do noncontiguous writes on a Lustre filesystem using > MPI-ROMIO routines. It is failing in ADIOI_Set_Lock with the message: > > File locking failed in ADIOI_Set_lock. If the file system is NFS, you > need to use NFS version 3, ensure that the lockd daemon is running on > all the machines, and mount the directory with the ''noac'' option (no > attribute caching). > > Now I''ve searched the web for some solution but only seem to come across > similar problems. Is there some way to mount a lustre filesystem so > that it somehow does the right thing here and use the ufs kernel > interface? NFS mounted filesystems work fine as they provide the file > locking mechanism apparently.The default for Lustre is to _not_ support cluster-coherent flock locking, because it can cause problems in some cases (depending on locking mode). There are client mount options "-o localflock" which just pretends that flock locking is working (it is using only the local node''s flock locking), and "-o flock" which enables real filesystem-wide flock. You can pick between the two, depending if you need real flock or not. Also, there have been improvements for the Lustre ADIO driver done at ORNL. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc.