thr3ads.net - Lustre discuss - [Lustre-discuss] how to reuse OST indices (EADDRINUSE) [Dec 2010]

If this information is useful, please help other people find it:
Share via:

Craig Prescott

2010-Dec-21 00:33 UTC

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

Hello list,

We recently evacuated several OSTs on a single OSS, replaced RAID 
controllers, re-initialized RAIDs for new OSTs, and made new lustre 
filesystems for them, using the same OST indices as we had before.

The filesystem and all its clients have been up and running the whole 
time.  We disabled the OSTs we were working on on all clients and our 
MGS/MDS (lctl dl shows them as "IN" everywhere).

Now we want to bring the newly-formatted OSTs back online.  When we try 
to mount the "new" OSTs, we get this for each one in this syslog of
the
OSS that has been under maintenance:
> Lustre: MGC10.13.28.210 at o2ib: Reactivating import
> LustreError: 11-0: an error occurred while communicating with 10.13.28.210
at o2ib. The mgs_target_reg operation failed with -98
> LustreError: 6065:0:(obd_mount.c:1097:server_start_targets()) Required
registration failed for cms-OST0006: -98
> LustreError: 6065:0:(obd_mount.c:1655:server_fill_super()) Unable to start
targets: -98
> LustreError: 6065:0:(obd_mount.c:1438:server_put_super()) no obd
cms-OST0006
> LustreError: 6065:0:(obd_mount.c:147:server_deregister_mount()) cms-OST0006
not registered
What do we need to do to get these OSTs back into the filesystem?

We really want to reuse the original indices.

This is Lustre 1.8.4, btw.

Thanks,
Craig Prescott
UF HPC Center

Wang Yibin

2010-Dec-21 04:18 UTC

head link

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

Hello,

Did you backup old magic files (last_rcvd, LAST_ID, CONFIG/*) from the original
OSTs and put them back before trying to mount them?
You probably didn''t do that. So when you remount the OSTs with existing
index, the MGS will refuse to add them without being told to writeconf, hence
-EADDRINUSE.
The proper ways to replace an OST are described in bug 24128.

? 2010-12-21???8:33? Craig Prescott ???
> 
> Hello list,
> 
> We recently evacuated several OSTs on a single OSS, replaced RAID 
> controllers, re-initialized RAIDs for new OSTs, and made new lustre 
> filesystems for them, using the same OST indices as we had before.
> 
> The filesystem and all its clients have been up and running the whole 
> time.  We disabled the OSTs we were working on on all clients and our 
> MGS/MDS (lctl dl shows them as "IN" everywhere).
> 
> Now we want to bring the newly-formatted OSTs back online.  When we try 
> to mount the "new" OSTs, we get this for each one in this syslog
of the
> OSS that has been under maintenance:
> 
>> Lustre: MGC10.13.28.210 at o2ib: Reactivating import
>> LustreError: 11-0: an error occurred while communicating with
10.13.28.210 at o2ib. The mgs_target_reg operation failed with -98
>> LustreError: 6065:0:(obd_mount.c:1097:server_start_targets()) Required
registration failed for cms-OST0006: -98
>> LustreError: 6065:0:(obd_mount.c:1655:server_fill_super()) Unable to
start targets: -98
>> LustreError: 6065:0:(obd_mount.c:1438:server_put_super()) no obd
cms-OST0006
>> LustreError: 6065:0:(obd_mount.c:147:server_deregister_mount())
cms-OST0006 not registered
> 
> What do we need to do to get these OSTs back into the filesystem?
> 
> We really want to reuse the original indices.
> 
> This is Lustre 1.8.4, btw.
> 
> Thanks,
> Craig Prescott
> UF HPC Center
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Craig Prescott

2010-Dec-21 14:11 UTC

head link

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

Thanks for this.  You are right - we didn''t back up and replace those
files.  We did this once before, and I don''t recall doing a writeconf
or
anything with magic files, but I guess we must have.

We no longer have magic files (last_rcvd, LAST_ID, CONFIG/*) from the
old OSTs.  According to bug 24128, this puts us in the "cold replace"
scenario.  Is there anything we can do to avoid quiescing the entire
filesystem?  If we can avoid unmounting all the clients, we''d prefer
it.

Since our combo MGT/MDT is going to have to be unmounted for the
writeconf, we''d have the opportunity to mount it as ldiskfs and muck
around.

Thanks again,
Craig Prescott
UF HPC Center

Wang Yibin wrote:> Hello,
> 
> Did you backup old magic files (last_rcvd, LAST_ID, CONFIG/*) from the
original OSTs and put them back before trying to mount them?
> You probably didn''t do that. So when you remount the OSTs with
existing index, the MGS will refuse to add them without being told to writeconf,
hence -EADDRINUSE.
> The proper ways to replace an OST are described in bug 24128.
> 
> On 2010-12-21 at 8:33pm, Craig Prescott wrote:
> 
>> Hello list,
>>
>> We recently evacuated several OSTs on a single OSS, replaced RAID 
>> controllers, re-initialized RAIDs for new OSTs, and made new lustre 
>> filesystems for them, using the same OST indices as we had before.
>>
>> The filesystem and all its clients have been up and running the whole 
>> time.  We disabled the OSTs we were working on on all clients and our 
>> MGS/MDS (lctl dl shows them as "IN" everywhere).
>>
>> Now we want to bring the newly-formatted OSTs back online.  When we try
>> to mount the "new" OSTs, we get this for each one in this
syslog of the
>> OSS that has been under maintenance:
>>
>>> Lustre: MGC10.13.28.210 at o2ib: Reactivating import
>>> LustreError: 11-0: an error occurred while communicating with
10.13.28.210 at o2ib. The mgs_target_reg operation failed with -98
>>> LustreError: 6065:0:(obd_mount.c:1097:server_start_targets())
Required registration failed for cms-OST0006: -98
>>> LustreError: 6065:0:(obd_mount.c:1655:server_fill_super()) Unable
to start targets: -98
>>> LustreError: 6065:0:(obd_mount.c:1438:server_put_super()) no obd
cms-OST0006
>>> LustreError: 6065:0:(obd_mount.c:147:server_deregister_mount())
cms-OST0006 not registered
>> What do we need to do to get these OSTs back into the filesystem?
>>
>> We really want to reuse the original indices.
>>
>> This is Lustre 1.8.4, btw.
>>
>> Thanks,
>> Craig Prescott
>> UF HPC Center
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Charles Taylor

2010-Dec-21 15:58 UTC

head link

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

A little more info here....

We have 30 OSTs hosted on 5 OSSs where we are using Areca 1680ix PCI-E  
RAID controllers.    The long and short of it is that the Areca  
1680ix''s have proven completely buggy and unreliable - not what you  
want in a RAID card.    So we are evacuating all the OSTs, replacing  
the Areca 1680ix cards with Adaptec 51645s, re-initializing the LUNs,  
reformatting the LUNs as OSTs (using the same OST index as before) and  
remounting them.    That is the plan anyway.

We''ve already reformatted (mkfs.lustre) one set of 6 OSTs and did not  
save the "magic" files and so are getting the "Address in
Use" error
for those OSTs.   That being the case, I assume we must

1. Unmount the file system from all clients
2. Unmount the OSTs
3 Unmount the MDT
4. tunefs.lustre --writeconf /dev/mdt
5. remount the MDT
6. remount the OSTs (including the reformatted ones)
7. remount the file system on clients.

1. Is this the correct sequence?
2. Will this leave all our data intact?
3. Must we do a writeconf on the OSTs too or just the MDT?

Also, for the remaining OSTs we will save the "magic" files and  
restore them after reformatting which should eliminate the need for  
the procedure above.    With some of the OSTs mounted as ldiskfs, I  
see the last_rcvd file and the CONFIG directory but no LAST_ID  
file.     Should the LAST_ID file be there?

Regards,

Charlie Taylor
UF HPC Center

On Dec 20, 2010, at 11:18 PM, Wang Yibin wrote:
> Hello,
>
> Did you backup old magic files (last_rcvd, LAST_ID, CONFIG/*) from  
> the original OSTs and put them back before trying to mount them?
> You probably didn''t do that. So when you remount the OSTs with  
> existing index, the MGS will refuse to add them without being told  
> to writeconf, hence -EADDRINUSE.
> The proper ways to replace an OST are described in bug 24128.
>
> ? 2010-12-21???8:33? Craig Prescott ???
>
>>
>> Hello list,
>>
>> We recently evacuated several OSTs on a single OSS, replaced RAID
>> controllers, re-initialized RAIDs for new OSTs, and made new lustre
>> filesystems for them, using the same OST indices as we had before.
>>
>> The filesystem and all its clients have been up and running the whole
>> time.  We disabled the OSTs we were working on on all clients and our
>> MGS/MDS (lctl dl shows them as "IN" everywhere).
>>
>> Now we want to bring the newly-formatted OSTs back online.  When we  
>> try
>> to mount the "new" OSTs, we get this for each one in this
syslog of
>> the
>> OSS that has been under maintenance:
>>
>>> Lustre: MGC10.13.28.210 at o2ib: Reactivating import
>>> LustreError: 11-0: an error occurred while communicating with  
>>> 10.13.28.210 at o2ib. The mgs_target_reg operation failed with -98
>>> LustreError: 6065:0:(obd_mount.c:1097:server_start_targets())  
>>> Required registration failed for cms-OST0006: -98
>>> LustreError: 6065:0:(obd_mount.c:1655:server_fill_super()) Unable  
>>> to start targets: -98
>>> LustreError: 6065:0:(obd_mount.c:1438:server_put_super()) no obd  
>>> cms-OST0006
>>> LustreError: 6065:0:(obd_mount.c:147:server_deregister_mount())  
>>> cms-OST0006 not registered
>>
>> What do we need to do to get these OSTs back into the filesystem?
>>
>> We really want to reuse the original indices.
>>
>> This is Lustre 1.8.4, btw.
>>
>> Thanks,
>> Craig Prescott
>> UF HPC Center
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

2010-Dec-21 17:39 UTC

head link

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

On 2010-12-21, at 8:58, Charles Taylor <taylor at hpc.ufl.edu>
wrote:> So we are evacuating all the OSTs, replacing  
> the Areca 1680ix cards with Adaptec 51645s, re-initializing the LUNs,  
> reformatting the LUNs as OSTs (using the same OST index as before) and  
> remounting them.    That is the plan anyway.
It''s unfortunate that you didn''t see the thread from a few
weeks ago that discussed this exact topic of OST replacement. It should get a
section in the manual I think.
> We''ve already reformatted (mkfs.lustre) one set of 6 OSTs and did
not
> save the "magic" files and so are getting the "Address in
Use" error
> for those OSTs.   That being the case, I assume we must
> 
> 1. Unmount the file system from all clients
> 2. Unmount the OSTs
> 3 Unmount the MDT
> 4. tunefs.lustre --writeconf /dev/mdt
> 5. remount the MDT
> 6. remount the OSTs (including the reformatted ones)
> 7. remount the file system on clients.
> 
> 1. Is this the correct sequence?
> 2. Will this leave all our data intact?
> 3. Must we do a writeconf on the OSTs too or just the MDT?
There is actually only a flag on the OSTs that needs to be changed to have it
stop trying to register with the MGS, and just pretend to be the OST index that
you formatted it as.

The flag is in the binary CONFIGS/mountdata file (struct lustre_disk_data) in
the ldd_flags field (offset 20). It should only have the LDD_F_SV_TYPE_OST (2)
set.
> Also, for the remaining OSTs we will save the "magic" files and  
> restore them after reformatting which should eliminate the need for  
> the procedure above.    With some of the OSTs mounted as ldiskfs, I  
> see the last_rcvd file and the CONFIG directory but no LAST_ID  
> file.     Should the LAST_ID file be there?
This file is at /O/0/LAST_ID (capital ''o'' then zero) and
should be copied for OSTs you haven''t replaced yet, along with the
other files.  It can be recreated with a binary editor from the value on the MDS
(lctl get_param osc.*.prealloc_next_id) for the 6 OSTs that have already been
replaced. Search the list or bugzilla for "LAST_ID" for a detailed
procedure.
> On Dec 20, 2010, at 11:18 PM, Wang Yibin wrote:
> 
>> Hello,
>> 
>> Did you backup old magic files (last_rcvd, LAST_ID, CONFIG/*) from  
>> the original OSTs and put them back before trying to mount them?
>> You probably didn''t do that. So when you remount the OSTs with
>> existing index, the MGS will refuse to add them without being told  
>> to writeconf, hence -EADDRINUSE.
>> The proper ways to replace an OST are described in bug 24128.
>> 
>> ? 2010-12-21???8:33? Craig Prescott ???
>> 
>>> 
>>> Hello list,
>>> 
>>> We recently evacuated several OSTs on a single OSS, replaced RAID
>>> controllers, re-initialized RAIDs for new OSTs, and made new lustre
>>> filesystems for them, using the same OST indices as we had before.
>>> 
>>> The filesystem and all its clients have been up and running the
whole
>>> time.  We disabled the OSTs we were working on on all clients and
our
>>> MGS/MDS (lctl dl shows them as "IN" everywhere).
>>> 
>>> Now we want to bring the newly-formatted OSTs back online.  When we
>>> try
>>> to mount the "new" OSTs, we get this for each one in this
syslog of
>>> the
>>> OSS that has been under maintenance:
>>> 
>>>> Lustre: MGC10.13.28.210 at o2ib: Reactivating import
>>>> LustreError: 11-0: an error occurred while communicating with  
>>>> 10.13.28.210 at o2ib. The mgs_target_reg operation failed with
-98
>>>> LustreError: 6065:0:(obd_mount.c:1097:server_start_targets())  
>>>> Required registration failed for cms-OST0006: -98
>>>> LustreError: 6065:0:(obd_mount.c:1655:server_fill_super())
Unable
>>>> to start targets: -98
>>>> LustreError: 6065:0:(obd_mount.c:1438:server_put_super()) no
obd
>>>> cms-OST0006
>>>> LustreError: 6065:0:(obd_mount.c:147:server_deregister_mount())
>>>> cms-OST0006 not registered
>>> 
>>> What do we need to do to get these OSTs back into the filesystem?
>>> 
>>> We really want to reuse the original indices.
>>> 
>>> This is Lustre 1.8.4, btw.
>>> 
>>> Thanks,
>>> Craig Prescott
>>> UF HPC Center
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Charles Taylor

2010-Dec-21 22:46 UTC

head link

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

On Dec 21, 2010, at 12:39 PM, Andreas Dilger wrote:
> It''s unfortunate that you didn''t see the thread from a
few weeks ago
> that discussed this exact topic of OST replacement.
Agreed.  :(
> It should get a section in the manual I think.
Agreed.
> This file is at /O/0/LAST_ID (capital ''o'' then zero) and
should be
> copied for OSTs you haven''t replaced yet, along with the other  
> files.  It can be recreated with a binary editor from the value on  
> the MDS (lctl get_param osc.*.prealloc_next_id) for the 6 OSTs that  
> have already been replaced. Search the list or bugzilla for  
> "LAST_ID" for a detailed procedure.
This seems to do the trick.   Thank you!.    One important  
clarification though...on the mds, should we getting the value of  
prealloc_next_id or prealloc_last_id?    Section 23.3.9 of the 2.0 Ops  
manual for "How to fix a Bad LAST_ID on an OST" seems to use  
prealloc_last_id.    Which should we be using?

Thank you again,

Charlie Taylor
UF HPC Center

Andreas Dilger

2010-Dec-21 23:18 UTC

head link

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

On 2010-12-21, at 15:46, Charles Taylor wrote:> On Dec 21, 2010, at 12:39 PM, Andreas Dilger wrote:
>> This file is at /O/0/LAST_ID (capital ''o'' then zero)
and should be copied for OSTs you haven''t replaced yet, along with the
other files.  It can be recreated with a binary editor from the value on the MDS
(lctl get_param osc.*.prealloc_next_id) for the 6 OSTs that have already been
replaced. Search the list or bugzilla for "LAST_ID" for a detailed
procedure.
> 
> This seems to do the trick.   Thank you!.    One important clarification
though...on the mds, should we getting the value of prealloc_next_id or
prealloc_last_id?    Section 23.3.9 of the 2.0 Ops manual for "How to fix a
Bad LAST_ID on an OST" seems to use prealloc_last_id.    Which should we be
using?
Using prealloc_next_id is technically more correct for your situation, since the
lower-numbered objects have not been precreated on the OST.  If you used
prealloc_last_id then the MDS and OSS would assume the lower-numbered objects
exist but the clients would get IO errors trying to access them.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Charles Taylor

2010-Dec-21 23:34 UTC

head link

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

Got it.  Thank you again.

Charlie Taylor
UF HPC Center

On Dec 21, 2010, at 6:18 PM, Andreas Dilger wrote:
> On 2010-12-21, at 15:46, Charles Taylor wrote:
>> On Dec 21, 2010, at 12:39 PM, Andreas Dilger wrote:
>>> This file is at /O/0/LAST_ID (capital ''o'' then
zero) and should be copied for OSTs you haven''t replaced yet, along
with the other files.  It can be recreated with a binary editor from the value
on the MDS (lctl get_param osc.*.prealloc_next_id) for the 6 OSTs that have
already been replaced. Search the list or bugzilla for "LAST_ID" for a
detailed procedure.
>> 
>> This seems to do the trick.   Thank you!.    One important
clarification though...on the mds, should we getting the value of
prealloc_next_id or prealloc_last_id?    Section 23.3.9 of the 2.0 Ops manual
for "How to fix a Bad LAST_ID on an OST" seems to use
prealloc_last_id.    Which should we be using?
> 
> Using prealloc_next_id is technically more correct for your situation,
since the lower-numbered objects have not been precreated on the OST.  If you
used prealloc_last_id then the MDS and OSS would assume the lower-numbered
objects exist but the clients would get IO errors trying to access them.
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
>

Lustre discuss - Dec 2010 - how to reuse OST indices (EADDRINUSE)

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)

[Lustre-discuss] how to reuse OST indices (EADDRINUSE)