thr3ads.net - Lustre discuss - [Lustre-discuss] questions about an OST content [Nov 2010]

If this information is useful, please help other people find it:
Share via:

Bob Ball

2010-Nov-06 14:24 UTC

[Lustre-discuss] questions about an OST content

I am emptying a set of OST so that I can reformat the underlying RAID-6 
more efficiently.  Two questions:
1. Is there a quick way to tell if the OST is really empty?  lfs_find 
takes many hours to run.
2. When I reformat, I want it to retain the same ID so as to not make 
"holes" in the list.  From the following information, am I correct to 
assume that the id is 24?  If not, how do I determine the correct ID to 
use when we re-create the file system?

/dev/sdj              3.5T  3.1T  222G  94% /mnt/ost51
  10 UP obdfilter umt3-OST0018 umt3-OST0018_UUID 547
umt3-OST0018_UUID           3.4T        3.0T      221.1G  88% 
/lustre/umt3[OST:24]
  20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID 5

Thanks much,
bob

Ashley Pittman

2010-Nov-06 14:31 UTC

head link

[Lustre-discuss] questions about an OST content

On 6 Nov 2010, at 14:24, Bob Ball wrote:
> I am emptying a set of OST so that I can reformat the underlying RAID-6 
> more efficiently.  Two questions:
> 1. Is there a quick way to tell if the OST is really empty?  lfs_find 
> takes many hours to run.
lfs df -i from a client or simply df -i from the OSS node will tell you whilst
the OST in on-line, when it''s offline you can remount it as type
ldiskfs and use ls to verify that it''s really empty.
> 2. When I reformat, I want it to retain the same ID so as to not make 
> "holes" in the list.  From the following information, am I
correct to
> assume that the id is 24?  If not, how do I determine the correct ID to 
> use when we re-create the file system?
"tunefs.lustre --print /dev/sdj" will tell you the index in base 10.

Ashley.

Joe Landman

2010-Nov-06 14:38 UTC

head link

[Lustre-discuss] questions about an OST content

On 11/06/2010 10:24 AM, Bob Ball wrote:> I am emptying a set of OST so that I can reformat the underlying RAID-6
> more efficiently.  Two questions:
> 1. Is there a quick way to tell if the OST is really empty?  lfs_find
> takes many hours to run.
Yes ...

	df -H /path/to/OST/mount_point

if you have more than a few megabytes in the used column, it is probably 
in use.  There are other mechanisms, but we find this one works quickest 
to get a baseline read of how full the OST is.

[...]
> /dev/sdj              3.5T  3.1T  222G  94% /mnt/ost51
>    10 UP obdfilter umt3-OST0018 umt3-OST0018_UUID 547
> umt3-OST0018_UUID           3.4T        3.0T      221.1G  88%
> /lustre/umt3[OST:24]
>    20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID 5
This suggests you are at about 90% utilization of this OST (/dev/sdj)


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615

Andreas Dilger

2010-Nov-06 15:09 UTC

head link

[Lustre-discuss] questions about an OST content

On 2010-11-06, at 8:24, Bob Ball <ball at umich.edu>
wrote:> I am emptying a set of OST so that I can reformat the underlying RAID-6 
> more efficiently.  Two questions:
> 1. Is there a quick way to tell if the OST is really empty?  lfs_find 
> takes many hours to run.
If you mount the OST as type ldiskfs and look in the O/0/d* directories
(capital-O, zero) there should be a few hundred zero-length objects owned by
root.
> 2. When I reformat, I want it to retain the same ID so as to not make 
> "holes" in the list.  From the following information, am I
correct to
> assume that the id is 24?  If not, how do I determine the correct ID to 
> use when we re-create the file system?
If you still have the existing OST, the easiest way to do this is to save the
files last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy them into the reformatted
OST.
> /dev/sdj              3.5T  3.1T  222G  94% /mnt/ost51
>  10 UP obdfilter umt3-OST0018 umt3-OST0018_UUID 547
> umt3-OST0018_UUID           3.4T        3.0T      221.1G  88% 
> /lustre/umt3[OST:24]
>  20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID 5
The OST index is indeed 24 (18 hex). As for /dev/sdj, it is hard to know from
the above info. If you run "e2label /dev/sdj"  the filesystem label
should match the OST name umt3-OST0018.

Cheers, Andreas

Bob Ball

2010-Nov-06 19:28 UTC

head link

[Lustre-discuss] questions about an OST content

Responding to everyone....  (and thanks to all)

lfs df -i from a client or simply df -i from the OSS node ...
This still shows of order 100 inodes after the OST was emptied.

"tunefs.lustre --print /dev/sdj" will tell you the index in base 10.
Yes, this worked.

df -H /path/to/OST/mount_point
This still shows several hundred MB as well, in disparate amounts per 
OST I''m emptying.  An indicator, but not perfect.  Yes, the OST were 
near full to begin with.

More below.

On 11/6/2010 11:09 AM, Andreas Dilger wrote:> On 2010-11-06, at 8:24, Bob Ball<ball at umich.edu>  wrote:
>> I am emptying a set of OST so that I can reformat the underlying RAID-6
>> more efficiently.  Two questions:
>> 1. Is there a quick way to tell if the OST is really empty?  lfs_find
>> takes many hours to run.
> If you mount the OST as type ldiskfs and look in the O/0/d* directories
(capital-O, zero) there should be a few hundred zero-length objects owned by
root.I assume this is ALL I should see.  Certainly the used inode count 
agrees with this. I will look shortly.
>> 2. When I reformat, I want it to retain the same ID so as to not make
>> "holes" in the list.  From the following information, am I
correct to
>> assume that the id is 24?  If not, how do I determine the correct ID to
>> use when we re-create the file system?
> If you still have the existing OST, the easiest way to do this is to save
the files last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy them into the
reformatted OST.
I will do that, thanks.>> /dev/sdj              3.5T  3.1T  222G  94% /mnt/ost51
>>   10 UP obdfilter umt3-OST0018 umt3-OST0018_UUID 547
>> umt3-OST0018_UUID           3.4T        3.0T      221.1G  88%
>> /lustre/umt3[OST:24]
>>   20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID 5
> The OST index is indeed 24 (18 hex). As for /dev/sdj, it is hard to know
from the above info. If you run "e2label /dev/sdj"  the filesystem
label should match the OST name umt3-OST0018.I intend to be VERY careful with this.  Thank you all.  Any further 
advice before I do this, likely on Monday, will be greatly appreciated.

bob> Cheers, Andreas
>

Ashley Pittman

2010-Nov-06 20:58 UTC

head link

[Lustre-discuss] questions about an OST content

On 6 Nov 2010, at 19:28, Bob Ball wrote:> I intend to be VERY careful with this.  Thank you all.  Any further 
> advice before I do this, likely on Monday, will be greatly appreciated.
I believe it is possible to use udev to assign device names to devices which
would make the pathname both consistent across boots and configurable.  If it
isn''t possible with udev then I know it''s possible with
multipath as we always use this on DDN systems (I work for DDN), it both handles
failover between devices but also allows them to be named as we choose. 
I''m not saying we''re not careful ourselves but you need to be
a lot less careful if you are working with /dev/mapper/ost_24 than you do if you
are working with /dev/sdj

Basically install device-mapper-multipath and add the entries to
/etc/multipath.conf.

On one of our test systems in the lab we have the following currently, this is a
virtual test system so only single path to each device.

/dev/mapper/ost_sab_1
                       4128448   1579584   2339152  41% /lustre/sab/ost_1
/dev/mapper/ost_sab_0
                       4128448   1278200   2640536  33% /lustre/sab/ost_0

Ashley.

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk

Bob Ball

2010-Nov-07 19:32 UTC

head link

[Lustre-discuss] questions about an OST content

Hi, Andreas.

Tomorrow, we will redo all 8 OST on the first file server we are 
redoing.  I am very nervous about this, as a lot is riding on us doing 
this correctly.  For example, on a client now, if I umount one of the 
ost, without first taking some (unknown to me) action on the MDT, then 
the client will hang on the "df" command.

So, while we are doing the reformat, is there any way to avoid this 
"hang" situation?

Is the --index=XX argument to mkfs.lustre hex, or decimal?  Seems from 
your comment below that this must be hex?

Finally, does supplying the --index even matter if we restore the files 
below that you mention?  That seems to be what you are saying.

Thanks much,
bob

On 11/6/2010 11:09 AM, Andreas Dilger wrote:> On 2010-11-06, at 8:24, Bob Ball<ball at umich.edu>  wrote:
>> I am emptying a set of OST so that I can reformat the underlying RAID-6
>> more efficiently.  Two questions:
>> 1. Is there a quick way to tell if the OST is really empty?  lfs_find
>> takes many hours to run.
> If you mount the OST as type ldiskfs and look in the O/0/d* directories
(capital-O, zero) there should be a few hundred zero-length objects owned by
root.
>
>> 2. When I reformat, I want it to retain the same ID so as to not make
>> "holes" in the list.  From the following information, am I
correct to
>> assume that the id is 24?  If not, how do I determine the correct ID to
>> use when we re-create the file system?
> If you still have the existing OST, the easiest way to do this is to save
the files last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy them into the
reformatted OST.
>
>> /dev/sdj              3.5T  3.1T  222G  94% /mnt/ost51
>>   10 UP obdfilter umt3-OST0018 umt3-OST0018_UUID 547
>> umt3-OST0018_UUID           3.4T        3.0T      221.1G  88%
>> /lustre/umt3[OST:24]
>>   20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID 5
> The OST index is indeed 24 (18 hex). As for /dev/sdj, it is hard to know
from the above info. If you run "e2label /dev/sdj"  the filesystem
label should match the OST name umt3-OST0018.
>
> Cheers, Andreas
>

Bob Ball

2010-Nov-07 19:43 UTC

head link

[Lustre-discuss] questions about an OST content

BTW, the new OST sizes will be much different from the original OST 
sizes.  Is the "copy  the old file" method below still valid in this
case?

bob

On 11/7/2010 2:32 PM, Bob Ball wrote:> Hi, Andreas.
>
> Tomorrow, we will redo all 8 OST on the first file server we are
> redoing.  I am very nervous about this, as a lot is riding on us doing
> this correctly.  For example, on a client now, if I umount one of the
> ost, without first taking some (unknown to me) action on the MDT, then
> the client will hang on the "df" command.
>
> So, while we are doing the reformat, is there any way to avoid this
> "hang" situation?
>
> Is the --index=XX argument to mkfs.lustre hex, or decimal?  Seems from
> your comment below that this must be hex?
>
> Finally, does supplying the --index even matter if we restore the files
> below that you mention?  That seems to be what you are saying.
>
> Thanks much,
> bob
>
> On 11/6/2010 11:09 AM, Andreas Dilger wrote:
>> On 2010-11-06, at 8:24, Bob Ball<ball at umich.edu>   wrote:
>>> I am emptying a set of OST so that I can reformat the underlying
RAID-6
>>> more efficiently.  Two questions:
>>> 1. Is there a quick way to tell if the OST is really empty? 
lfs_find
>>> takes many hours to run.
>> If you mount the OST as type ldiskfs and look in the O/0/d* directories
(capital-O, zero) there should be a few hundred zero-length objects owned by
root.
>>
>>> 2. When I reformat, I want it to retain the same ID so as to not
make
>>> "holes" in the list.  From the following information, am
I correct to
>>> assume that the id is 24?  If not, how do I determine the correct
ID to
>>> use when we re-create the file system?
>> If you still have the existing OST, the easiest way to do this is to
save the files last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy them into the
reformatted OST.
>>
>>> /dev/sdj              3.5T  3.1T  222G  94% /mnt/ost51
>>>    10 UP obdfilter umt3-OST0018 umt3-OST0018_UUID 547
>>> umt3-OST0018_UUID           3.4T        3.0T      221.1G  88%
>>> /lustre/umt3[OST:24]
>>>    20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID 5
>> The OST index is indeed 24 (18 hex). As for /dev/sdj, it is hard to
know from the above info. If you run "e2label /dev/sdj"  the
filesystem label should match the OST name umt3-OST0018.
>>
>> Cheers, Andreas
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>

Ashley Pittman

2010-Nov-07 20:44 UTC

head link

[Lustre-discuss] questions about an OST content

On 7 Nov 2010, at 19:32, Bob Ball wrote:> So, while we are doing the reformat, is there any way to avoid this 
> "hang" situation?
I believe there is but it escapes me at the minute.
> Is the --index=XX argument to mkfs.lustre hex, or decimal?  Seems from 
> your comment below that this must be hex?
Decimal.
> Finally, does supplying the --index even matter if we restore the files 
> below that you mention?  That seems to be what you are saying.
That''s my understanding as well.

Use tunefs.lustre --print after the format to verify all the options are the
same, including the index but also the flags, mount options and nodelist.  One
thing I''ve seen people forget is to enable quotas on new (or
replacement) OSTs which has the effect of disabling quotas until you are able to
run quotacheck - in a live filesystem this can be effectively permanent.

Ashley.

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk

Bob Ball

2010-Nov-07 20:53 UTC

head link

[Lustre-discuss] questions about an OST content

Thanks, Ashley.  No quotas, fortunately.  Tomorrow will be "fun".

bob

On 11/7/2010 3:44 PM, Ashley Pittman wrote:> On 7 Nov 2010, at 19:32, Bob Ball wrote:
>> So, while we are doing the reformat, is there any way to avoid this
>> "hang" situation?
> I believe there is but it escapes me at the minute.
>
>> Is the --index=XX argument to mkfs.lustre hex, or decimal?  Seems from
>> your comment below that this must be hex?
> Decimal.
>
>> Finally, does supplying the --index even matter if we restore the files
>> below that you mention?  That seems to be what you are saying.
> That''s my understanding as well.
>
> Use tunefs.lustre --print after the format to verify all the options are
the same, including the index but also the flags, mount options and nodelist. 
One thing I''ve seen people forget is to enable quotas on new (or
replacement) OSTs which has the effect of disabling quotas until you are able to
run quotacheck - in a live filesystem this can be effectively permanent.
>
> Ashley.
>

Andreas Dilger

2010-Nov-08 09:48 UTC

head link

[Lustre-discuss] questions about an OST content

None if the Lustre config files stores the OST size, so it should be fine. 

Note that even if your OST isn''t empty, you can just copy over all of
the files into the newly-formatted filesystem, so long as you copy the xattrs
with them.

Cheers, Andreas

On 2010-11-07, at 12:43, Bob Ball <ball at umich.edu> wrote:
> BTW, the new OST sizes will be much different from the original OST sizes. 
Is the "copy  the old file" method below still valid in this case?
> 
> bob
> 
> On 11/7/2010 2:32 PM, Bob Ball wrote:
>> Hi, Andreas.
>> 
>> Tomorrow, we will redo all 8 OST on the first file server we are
>> redoing.  I am very nervous about this, as a lot is riding on us doing
>> this correctly.  For example, on a client now, if I umount one of the
>> ost, without first taking some (unknown to me) action on the MDT, then
>> the client will hang on the "df" command.
>> 
>> So, while we are doing the reformat, is there any way to avoid this
>> "hang" situation?
>> 
>> Is the --index=XX argument to mkfs.lustre hex, or decimal?  Seems from
>> your comment below that this must be hex?
>> 
>> Finally, does supplying the --index even matter if we restore the files
>> below that you mention?  That seems to be what you are saying.
>> 
>> Thanks much,
>> bob
>> 
>> On 11/6/2010 11:09 AM, Andreas Dilger wrote:
>>> On 2010-11-06, at 8:24, Bob Ball<ball at umich.edu>   wrote:
>>>> I am emptying a set of OST so that I can reformat the
underlying RAID-6
>>>> more efficiently.  Two questions:
>>>> 1. Is there a quick way to tell if the OST is really empty? 
lfs_find
>>>> takes many hours to run.
>>> If you mount the OST as type ldiskfs and look in the O/0/d*
directories (capital-O, zero) there should be a few hundred zero-length objects
owned by root.
>>> 
>>>> 2. When I reformat, I want it to retain the same ID so as to
not make
>>>> "holes" in the list.  From the following information,
am I correct to
>>>> assume that the id is 24?  If not, how do I determine the
correct ID to
>>>> use when we re-create the file system?
>>> If you still have the existing OST, the easiest way to do this is
to save the files last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy them into the
reformatted OST.
>>> 
>>>> /dev/sdj              3.5T  3.1T  222G  94% /mnt/ost51
>>>>   10 UP obdfilter umt3-OST0018 umt3-OST0018_UUID 547
>>>> umt3-OST0018_UUID           3.4T        3.0T      221.1G  88%
>>>> /lustre/umt3[OST:24]
>>>>   20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID 5
>>> The OST index is indeed 24 (18 hex). As for /dev/sdj, it is hard to
know from the above info. If you run "e2label /dev/sdj"  the
filesystem label should match the OST name umt3-OST0018.
>>> 
>>> Cheers, Andreas
>>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> 
>>

Andreas Dilger

2010-Nov-08 10:01 UTC

head link

[Lustre-discuss] questions about an OST content

On 2010-11-07, at 12:32, Bob Ball <ball at umich.edu>
wrote:> Tomorrow, we will redo all 8 OST on the first file server we are redoing. 
I am very nervous about this, as a lot is riding on us doing this correctly. 
For example, on a client now, if I umount one of the ost, without first taking
some (unknown to me) action on the MDT, then the client will hang on the
"df" command.
> 
> So, while we are doing the reformat, is there any way to avoid this
"hang" situation?
If you issue "lctl --device %{OSC UUID} deactivate" on the MDS and
clients then any operations on those OSTs will immediately fail with an IO
error. If you are migrating I objects from those OSTs, I would have imagined you
already did this on the MDS or new objects would have continued to be allocated
there
> Is the --index=XX argument to mkfs.lustre hex, or decimal?  Seems from your
comment below that this must be hex?
Decimal, though it may also accept hex (I can''t check right now). 
> Finally, does supplying the --index even matter if we restore the files
below that you mention?  That seems to be what you are saying.
Well, you still need to set the filesystem label. This could be done with
tune2fs, but you may as well specify the right index from the beginning.
 > On 11/6/2010 11:09 AM, Andreas Dilger wrote:
>> On 2010-11-06, at 8:24, Bob Ball<ball at umich.edu>  wrote:
>>> I am emptying a set of OST so that I can reformat the underlying
RAID-6
>>> more efficiently.  Two questions:
>>> 1. Is there a quick way to tell if the OST is really empty? 
lfs_find
>>> takes many hours to run.
>> If you mount the OST as type ldiskfs and look in the O/0/d* directories
(capital-O, zero) there should be a few hundred zero-length objects owned by
root.
>> 
>>> 2. When I reformat, I want it to retain the same ID so as to not
make
>>> "holes" in the list.  From the following information, am
I correct to
>>> assume that the id is 24?  If not, how do I determine the correct
ID to
>>> use when we re-create the file system?
>> If you still have the existing OST, the easiest way to do this is to
save the files last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy them into the
reformatted OST.
>> 
>>> /dev/sdj              3.5T  3.1T  222G  94% /mnt/ost51
>>>  10 UP obdfilter umt3-OST0018 umt3-OST0018_UUID 547
>>> umt3-OST0018_UUID           3.4T        3.0T      221.1G  88%
>>> /lustre/umt3[OST:24]
>>>  20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID 5
>> The OST index is indeed 24 (18 hex). As for /dev/sdj, it is hard to
know from the above info. If you run "e2label /dev/sdj"  the
filesystem label should match the OST name umt3-OST0018.
>> 
>> Cheers, Andreas
>> -------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20101108/2aa9f11f/attachment-0001.html

Bob Ball

2010-Nov-08 18:38 UTC

head link

Re: questions about an OST content

OK, made new raid, made file system with same index, but they won''t
    mount.  This is the error.  What can I do here?

    bob

    mounting device /dev/sdc at /mnt/ost12, flags=0
    options=device=/dev/sdc

    mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already
    in use retries left: 0

    mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already
    in use

    The target service''s index is already in use. (/dev/sdc)

    On 11/8/2010 5:01 AM, Andreas Dilger wrote:

      On 2010-11-07, at
          12:32, Bob Ball
&lt;ball-63aXycvo3TyHXe+LvDLADg@public.gmane.org&gt; wrote:

        Tomorrow, we will redo all 8 OST on the first file
            server we are redoing.  I am very nervous about this, as a
            lot is riding on us doing this correctly.  For example, on a
            client now, if I umount one of the ost, without first taking
            some (unknown to me) action on the MDT, then the client will
            hang on the "df" command.

          So, while we are doing the reformat, is there any way to
            avoid this "hang" situation?

      If you issue "lctl --device %{OSC UUID} deactivate" on the MDS
and
      clients then any operations on those OSTs will immediately fail
      with an IO error. If you are migrating I objects from those OSTs,
      I would have imagined you already did this on the MDS or new
      objects would have continued to be allocated there

          Is the --index=XX argument to mkfs.lustre hex, or
              decimal?  Seems from your comment below that this must be
              hex?

        Decimal, though it may also accept hex (I can''t check right
        now). 

          Finally, does supplying the --index even matter if
              we restore the files below that you mention?  That seems
              to be what you are saying.

        Well, you still need to set the filesystem label. This could be
        done with tune2fs, but you may as well specify the right index
        from the beginning. 

          On 11/6/2010 11:09 AM, Andreas Dilger wrote:

            On 2010-11-06, at 8:24, Bob
                Ball&lt;ball-63aXycvo3TyHXe+LvDLADg@public.gmane.org&gt;
                 wrote:

              I am emptying a set of OST
                  so that I can reformat the underlying RAID-6

              more efficiently.  Two
                  questions:

              1. Is there a quick way to
                  tell if the OST is really empty?  lfs_find

              takes many hours to run.

            If you mount the OST as type
                ldiskfs and look in the O/0/d* directories (capital-O,
                zero) there should be a few hundred zero-length objects
                owned by root.

              2. When I reformat, I want
                  it to retain the same ID so as to not make

              "holes" in the list.  From
                  the following information, am I correct to

              assume that the id is 24?
                   If not, how do I determine the correct ID to

              use when we re-create the
                  file system?

            If you still have the existing
                OST, the easiest way to do this is to save the files
                last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy them
                into the reformatted OST.

              /dev/sdj              3.5T
                   3.1T  222G  94% /mnt/ost51

                10 UP obdfilter
                  umt3-OST0018 umt3-OST0018_UUID 547

              umt3-OST0018_UUID
                            3.4T        3.0T      221.1G  88%

              /lustre/umt3[OST:24]

                20 IN osc umt3-OST0018-osc
                  umt3-mdtlov_UUID 5

            The OST index is indeed 24 (18
                hex). As for /dev/sdj, it is hard to know from the above
                info. If you run "e2label /dev/sdj"  the filesystem
                label should match the OST name umt3-OST0018.

            Cheers, Andreas


_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Bob Ball

2010-Nov-08 18:39 UTC

head link

Re: questions about an OST content

Don''t know if I sent to the whole list.  One of those days.

    remade the raid device, remade the lustre fs on it, but the disks
    won''t mount.  Error is below.  How do I overcome this?

    Thanks,

    bob

    mounting device /dev/sdc at /mnt/ost12, flags=0
    options=device=/dev/sdc

    mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already
    in use retries left: 0

    mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already
    in use

    The target service''s index is already in use. (/dev/sdc)

    On 11/8/2010 5:01 AM, Andreas Dilger wrote:

      On 2010-11-07, at
          12:32, Bob Ball
&lt;ball-63aXycvo3TyHXe+LvDLADg@public.gmane.org&gt; wrote:

        Tomorrow, we will redo all 8 OST on the first file
            server we are redoing.  I am very nervous about this, as a
            lot is riding on us doing this correctly.  For example, on a
            client now, if I umount one of the ost, without first taking
            some (unknown to me) action on the MDT, then the client will
            hang on the "df" command.

          So, while we are doing the reformat, is there any way to
            avoid this "hang" situation?

      If you issue "lctl --device %{OSC UUID} deactivate" on the MDS
and
      clients then any operations on those OSTs will immediately fail
      with an IO error. If you are migrating I objects from those OSTs,
      I would have imagined you already did this on the MDS or new
      objects would have continued to be allocated there

          Is the --index=XX argument to mkfs.lustre hex, or
              decimal?  Seems from your comment below that this must be
              hex?

        Decimal, though it may also accept hex (I can''t check right
        now). 

          Finally, does supplying the --index even matter if
              we restore the files below that you mention?  That seems
              to be what you are saying.

        Well, you still need to set the filesystem label. This could be
        done with tune2fs, but you may as well specify the right index
        from the beginning. 

          On 11/6/2010 11:09 AM, Andreas Dilger wrote:

            On 2010-11-06, at 8:24, Bob
                Ball&lt;ball-63aXycvo3TyHXe+LvDLADg@public.gmane.org&gt;
                 wrote:

              I am emptying a set of OST
                  so that I can reformat the underlying RAID-6

              more efficiently.  Two
                  questions:

              1. Is there a quick way to
                  tell if the OST is really empty?  lfs_find

              takes many hours to run.

            If you mount the OST as type
                ldiskfs and look in the O/0/d* directories (capital-O,
                zero) there should be a few hundred zero-length objects
                owned by root.

              2. When I reformat, I want
                  it to retain the same ID so as to not make

              "holes" in the list.  From
                  the following information, am I correct to

              assume that the id is 24?
                   If not, how do I determine the correct ID to

              use when we re-create the
                  file system?

            If you still have the existing
                OST, the easiest way to do this is to save the files
                last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy them
                into the reformatted OST.

              /dev/sdj              3.5T
                   3.1T  222G  94% /mnt/ost51

                10 UP obdfilter
                  umt3-OST0018 umt3-OST0018_UUID 547

              umt3-OST0018_UUID
                            3.4T        3.0T      221.1G  88%

              /lustre/umt3[OST:24]

                20 IN osc umt3-OST0018-osc
                  umt3-mdtlov_UUID 5

            The OST index is indeed 24 (18
                hex). As for /dev/sdj, it is hard to know from the above
                info. If you run "e2label /dev/sdj"  the filesystem
                label should match the OST name umt3-OST0018.

            Cheers, Andreas


_______________________________________________
Lustre-discuss mailing list
Lustre-discuss-aLEFhgZF4x6X6Mz3xDxJMA@public.gmane.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Andreas Dilger

2010-Nov-08 20:04 UTC

head link

[Lustre-discuss] questions about an OST content

On 2010-11-08, at 11:39, Bob Ball wrote:> Don''t know if I sent to the whole list.  One of those days.
> 
> remade the raid device, remade the lustre fs on it, but the disks
won''t mount.  Error is below.  How do I overcome this?
> 
> mounting device /dev/sdc at /mnt/ost12, flags=0 options=device=/dev/sdc
> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already in use
retries left: 0
> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already in use
> The target service''s index is already in use. (/dev/sdc)
Looks like you didn''t copy the old "CONFIGS/mountdata" file
over the new one.  You can also use "--writeconf" (described in the
manual and several times on the list) to have the MGS re-generate the
configuration, which should fix this as well.
> On 11/8/2010 5:01 AM, Andreas Dilger wrote:
>> On 2010-11-07, at 12:32, Bob Ball <ball at umich.edu> wrote:
>>> Tomorrow, we will redo all 8 OST on the first file server we are
redoing.  I am very nervous about this, as a lot is riding on us doing this
correctly.  For example, on a client now, if I umount one of the ost, without
first taking some (unknown to me) action on the MDT, then the client will hang
on the "df" command.
>>> 
>>> So, while we are doing the reformat, is there any way to avoid this
"hang" situation?
>> 
>> If you issue "lctl --device %{OSC UUID} deactivate" on the
MDS and clients then any operations on those OSTs will immediately fail with an
IO error. If you are migrating I objects from those OSTs, I would have imagined
you already did this on the MDS or new objects would have continued to be
allocated there
>> 
>>> Is the --index=XX argument to mkfs.lustre hex, or decimal?  Seems
from your comment below that this must be hex?
>> 
>> Decimal, though it may also accept hex (I can''t check right
now).
>> 
>>> Finally, does supplying the --index even matter if we restore the
files below that you mention?  That seems to be what you are saying.
>> 
>> Well, you still need to set the filesystem label. This could be done
with tune2fs, but you may as well specify the right index from the beginning.
>>  
>>> On 11/6/2010 11:09 AM, Andreas Dilger wrote:
>>>> On 2010-11-06, at 8:24, Bob Ball<ball at umich.edu> 
wrote:
>>>>> I am emptying a set of OST so that I can reformat the
underlying RAID-6
>>>>> more efficiently.  Two questions:
>>>>> 1. Is there a quick way to tell if the OST is really empty?
lfs_find
>>>>> takes many hours to run.
>>>> If you mount the OST as type ldiskfs and look in the O/0/d*
directories (capital-O, zero) there should be a few hundred zero-length objects
owned by root.
>>>> 
>>>>> 2. When I reformat, I want it to retain the same ID so as
to not make
>>>>> "holes" in the list.  From the following
information, am I correct to
>>>>> assume that the id is 24?  If not, how do I determine the
correct ID to
>>>>> use when we re-create the file system?
>>>> If you still have the existing OST, the easiest way to do this
is to save the files last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy them into
the reformatted OST.
>>>> 
>>>>> /dev/sdj              3.5T  3.1T  222G  94% /mnt/ost51
>>>>>  10 UP obdfilter umt3-OST0018 umt3-OST0018_UUID 547
>>>>> umt3-OST0018_UUID           3.4T        3.0T      221.1G 
88%
>>>>> /lustre/umt3[OST:24]
>>>>>  20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID 5
>>>> The OST index is indeed 24 (18 hex). As for /dev/sdj, it is
hard to know from the above info. If you run "e2label /dev/sdj"  the
filesystem label should match the OST name umt3-OST0018.
>>>> 
>>>> Cheers, Andreas
>>>> 

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Aurélien Degrémont

2010-Nov-08 21:18 UTC

head link

[Lustre-discuss] questions about an OST content

Le 08/11/2010 21:04, Andreas Dilger a ?crit :> Looks like you didn''t copy the old "CONFIGS/mountdata"
file over the new one.  You can also use "--writeconf" (described in
the manual and several times on the list) to have the MGS re-generate the
configuration, which should fix this as well.
>    
Tell me if I''m wrong regarding this OST update.
AFAIK, there is two ways to replace an OST by a new one:

Hot replace:
1 - Disable your OST on MDT (lctl deactivate)
2 - Empty your OST
3 - Backup the magic files (last_rcvd, LAST_ID, CONFIG/*)
4 - Deactivate the OST on all clients also.
5 - Unmount the OST
6 - Replace, reformat using same index
7 - Put back the backup magic files.
8 - Restart the OST.
9 - Activate the OST everywhere.

Cold replace:
1 - Empty your OST
2 - Stop your filesystem
3 - Replace/reformat using the same index
4 - Restart using --writeconf
5 - Remount the clients

Did I miss something ?

As far as i understand this, the important point here is to have the OST 
internal information in sync with what the MGS (CONFIG/*) and the MDT 
(last_rcvd, LAST_ID) knows.
What is currently preventing, a freshly formatted OST with the same 
index, to register itself properly (using first_time flag) to MGS and 
MDT when remounting and:
  - refreshing its CONFIG from MGS internal cache
  - telling MDT to reset last_rcvd/LAST_ID it knows for this OST.
That way, we could have an easy way to hot replace an OST.
How do you think this can be achieved ?

Thanks

Aur?lien

Bob Ball

2010-Nov-08 21:27 UTC

head link

[Lustre-discuss] questions about an OST content

Yes, you are correct.  That was the key here, did not put that file back 
in place.  Back up and (so far) operating cleanly.

Thanks,
bob

On 11/8/2010 3:04 PM, Andreas Dilger wrote:> On 2010-11-08, at 11:39, Bob Ball wrote:
>> Don''t know if I sent to the whole list.  One of those days.
>>
>> remade the raid device, remade the lustre fs on it, but the disks
won''t mount.  Error is below.  How do I overcome this?
>>
>> mounting device /dev/sdc at /mnt/ost12, flags=0 options=device=/dev/sdc
>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already in
use retries left: 0
>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already in
use
>> The target service''s index is already in use. (/dev/sdc)
> Looks like you didn''t copy the old "CONFIGS/mountdata"
file over the new one.  You can also use "--writeconf" (described in
the manual and several times on the list) to have the MGS re-generate the
configuration, which should fix this as well.
>
>> On 11/8/2010 5:01 AM, Andreas Dilger wrote:
>>> On 2010-11-07, at 12:32, Bob Ball<ball at umich.edu>  wrote:
>>>> Tomorrow, we will redo all 8 OST on the first file server we
are redoing.  I am very nervous about this, as a lot is riding on us doing this
correctly.  For example, on a client now, if I umount one of the ost, without
first taking some (unknown to me) action on the MDT, then the client will hang
on the "df" command.
>>>>
>>>> So, while we are doing the reformat, is there any way to avoid
this "hang" situation?
>>> If you issue "lctl --device %{OSC UUID} deactivate" on
the MDS and clients then any operations on those OSTs will immediately fail with
an IO error. If you are migrating I objects from those OSTs, I would have
imagined you already did this on the MDS or new objects would have continued to
be allocated there
>>>
>>>> Is the --index=XX argument to mkfs.lustre hex, or decimal? 
Seems from your comment below that this must be hex?
>>> Decimal, though it may also accept hex (I can''t check
right now).
>>>
>>>> Finally, does supplying the --index even matter if we restore
the files below that you mention?  That seems to be what you are saying.
>>> Well, you still need to set the filesystem label. This could be
done with tune2fs, but you may as well specify the right index from the
beginning.
>>>
>>>> On 11/6/2010 11:09 AM, Andreas Dilger wrote:
>>>>> On 2010-11-06, at 8:24, Bob Ball<ball at umich.edu>  
wrote:
>>>>>> I am emptying a set of OST so that I can reformat the
underlying RAID-6
>>>>>> more efficiently.  Two questions:
>>>>>> 1. Is there a quick way to tell if the OST is really
empty?  lfs_find
>>>>>> takes many hours to run.
>>>>> If you mount the OST as type ldiskfs and look in the O/0/d*
directories (capital-O, zero) there should be a few hundred zero-length objects
owned by root.
>>>>>
>>>>>> 2. When I reformat, I want it to retain the same ID so
as to not make
>>>>>> "holes" in the list.  From the following
information, am I correct to
>>>>>> assume that the id is 24?  If not, how do I determine
the correct ID to
>>>>>> use when we re-create the file system?
>>>>> If you still have the existing OST, the easiest way to do
this is to save the files last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy them
into the reformatted OST.
>>>>>
>>>>>> /dev/sdj              3.5T  3.1T  222G  94% /mnt/ost51
>>>>>>   10 UP obdfilter umt3-OST0018 umt3-OST0018_UUID 547
>>>>>> umt3-OST0018_UUID           3.4T        3.0T     
221.1G  88%
>>>>>> /lustre/umt3[OST:24]
>>>>>>   20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID 5
>>>>> The OST index is indeed 24 (18 hex). As for /dev/sdj, it is
hard to know from the above info. If you run "e2label /dev/sdj"  the
filesystem label should match the OST name umt3-OST0018.
>>>>>
>>>>> Cheers, Andreas
>>>>>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
>
>
>

Andreas Dilger

2010-Nov-09 00:04 UTC

head link

[Lustre-discuss] questions about an OST content

On 2010-11-08, at 14:18, Aur?lien Degr?mont wrote:> Tell me if I''m wrong regarding this OST update.
> AFAIK, there is two ways to replace an OST by a new one:
> 
> Hot replace:
> 1 - Disable your OST on MDT (lctl deactivate)
> 2 - Empty your OST
> 3 - Backup the magic files (last_rcvd, LAST_ID, CONFIG/*)
> 4 - Deactivate the OST on all clients also.
> 5 - Unmount the OST
> 6 - Replace, reformat using same index
> 7 - Put back the backup magic files.
> 8 - Restart the OST.
> 9 - Activate the OST everywhere.
> 
> Cold replace:
> 1 - Empty your OST
> 2 - Stop your filesystem
> 3 - Replace/reformat using the same index
> 4 - Restart using --writeconf
> 5 - Remount the clients6 - fix up the MDS''s idea of the OST''s last-allocated object.
> Did I miss something ?
Other than #6, it looks correct.
> As far as i understand this, the important point here is to have the OST 
> internal information in sync with what the MGS (CONFIG/*) and the MDT 
> (last_rcvd, LAST_ID) knows.
Right.
> What is currently preventing, a freshly formatted OST with the same 
> index, to register itself properly (using first_time flag) to MGS and 
> MDT when remounting and:
>  - refreshing its CONFIG from MGS internal cache
>  - telling MDT to reset last_rcvd/LAST_ID it knows for this OST.
> That way, we could have an easy way to hot replace an OST.
> How do you think this can be achieved ?
It probably wouldn''t be impossible to have a new OST gracefully replace
an old one, if that is what the administrator wanted.  Some "special"
action would need to be taken on the OST and/or MDT to ensure that this is what
the admin wanted, instead of e.g. accidentally inserting some other OST with the
same index and corrupting the filesystem because of duplicate object IDs, or not
being able to access existing objects on the "real" OST at that index.

- the new OST would be best off to start allocating objects at the LAST_ID
  of the old OST, so that there is no risk of confusion between objects
- the MDT contains the old LAST_ID in it''s lov_objids file, and it
sends this
  to the OST at connection time, this is no problem
- currently the new OST will refuse to allow the MDT to connect, because it
  detects that the old LAST_ID value from the MDT is inconsistent with its
  own value
- it would be relatively straight forward to have the OST detect if the local
  LAST_ID value was "new" and use the MDT value instead
- the danger is if the LAST_ID file was lost for some reason (e.g. corruption
  causes e2fsck to erase it).  in that case, the OST startup code should be
  smart enough to regenerate LAST_ID based on walking the object directories,
  which would also avoid the need to do this in e2fsck/lfsck (which can only
  run offline)
- in cases where the on-disk LAST_ID is much lower than the MDT-supplied
  value, the OST should just skip precreation of all the intermediate objects
  and just start using the new MDT value
- the only other thing is to avoid the case where a "new" OST is
accidentally
  assigned the same index, when that isn''t what is wanted.  There needs
to be
  some way to "prime" the new OST (that is NOT the default for a newly
  formatted OST), or conversely tell the MDT that it should signal the new
  OST to take the place of the old one, so that there are not any mistakes

In conclusion, most of this is already close to working, but needs some amount
of effort to get it tested and working smoothly.

Since this is something that has come up on this list a number of times in the
last year, I guess it means that a Lustre filesystem is now outliving the
hardware on which it runs, so it would definitely be worthwhile for someone to
look at this.  I filed bug 24128 on this, in case anyone wants to work on it.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Aurelien Degremont

2010-Nov-09 10:07 UTC

head link

[Lustre-discuss] questions about an OST content

Hi

Andreas Dilger a ?crit :>> Cold replace:
>> 1 - Empty your OST
>> 2 - Stop your filesystem
>> 3 - Replace/reformat using the same index
>> 4 - Restart using --writeconf
>> 5 - Remount the clients
> 6 - fix up the MDS''s idea of the OST''s last-allocated
object.
> 
>> Did I miss something ?
> 
> Other than #6, it looks correct.
> 
How do you fix #6?
What are the actions needed for that?
>> What is currently preventing, a freshly formatted OST with the same 
>> index, to register itself properly (using first_time flag) to MGS and 
>> MDT when remounting and:
>>  - refreshing its CONFIG from MGS internal cache
>>  - telling MDT to reset last_rcvd/LAST_ID it knows for this OST.
>> That way, we could have an easy way to hot replace an OST.
>> How do you think this can be achieved ?
> 
> It probably wouldn''t be impossible to have a new OST gracefully
replace an old one, if that is what the administrator wanted.  Some
"special" action would need to be taken on the OST and/or MDT to
ensure that this is what the admin wanted, instead of e.g. accidentally
inserting some other OST with the same index and corrupting the filesystem
because of duplicate object IDs, or not being able to access existing objects on
the "real" OST at that index.
> 
> - the new OST would be best off to start allocating objects at the LAST_ID
>   of the old OST, so that there is no risk of confusion between objects
> - the MDT contains the old LAST_ID in it''s lov_objids file, and it
sends this
>   to the OST at connection time, this is no problem
> - currently the new OST will refuse to allow the MDT to connect, because it
>   detects that the old LAST_ID value from the MDT is inconsistent with its
>   own value
> - it would be relatively straight forward to have the OST detect if the
local
>   LAST_ID value was "new" and use the MDT value instead
Can we based this check on ''first_time'' flag.
I mean, OST update its LAST_ID based on what MDT tell it only if it has the
''first_time'' flag set.
> - the danger is if the LAST_ID file was lost for some reason (e.g.
corruption
>   causes e2fsck to erase it).  in that case, the OST startup code should be
>   smart enough to regenerate LAST_ID based on walking the object
directories,
>   which would also avoid the need to do this in e2fsck/lfsck (which can
only
>   run offline)
> - in cases where the on-disk LAST_ID is much lower than the MDT-supplied
>   value, the OST should just skip precreation of all the intermediate
objects
>   and just start using the new MDT value
This seems a different feature, even if related, which is "Better handling
of LAST_ID corruption".
> - the only other thing is to avoid the case where a "new" OST is
accidentally
>   assigned the same index, when that isn''t what is wanted.  There
needs to be
>   some way to "prime" the new OST (that is NOT the default for a
newly
>   formatted OST), or conversely tell the MDT that it should signal the new
>   OST to take the place of the old one, so that there are not any mistakes
Indeed, this is important. And if we want to have this supports online replace.
Another option when formatting OST?
--replace ? Which is only accepted when --index is set?
> Since this is something that has come up on this list a number of times in
the last year, I guess it means that a Lustre filesystem is now outliving the
hardware on which it runs, so it would definitely be worthwhile for someone to
look at this.  I filed bug 24128 on this, in case anyone wants to work on it.
Can you also add it to Community project list?


Thanks

-- 
Aurelien Degremont
CEA

Andreas Dilger

2010-Nov-10 10:06 UTC

head link

[Lustre-discuss] questions about an OST content

On 2010-11-09, at 03:07, Aurelien Degremont wrote:> Andreas Dilger a ?crit :
>>> Cold replace:
>>> 1 - Empty your OST
>>> 2 - Stop your filesystem
>>> 3 - Replace/reformat using the same index
>>> 4 - Restart using --writeconf
>>> 5 - Remount the clients
>> 6 - fix up the MDS''s idea of the OST''s last-allocated
object.
>>> Did I miss something ?
>> Other than #6, it looks correct.
> 
> How do you fix #6?  What are the actions needed for that?
That is what is described in the rest of this email...
>>> What is currently preventing, a freshly formatted OST with the same
index, to register itself properly (using first_time flag) to MGS and MDT when
remounting and:
>>> - refreshing its CONFIG from MGS internal cache
>>> - telling MDT to reset last_rcvd/LAST_ID it knows for this OST.
>>> That way, we could have an easy way to hot replace an OST.
>>> How do you think this can be achieved ?
>> It probably wouldn''t be impossible to have a new OST
gracefully replace an old one, if that is what the administrator wanted.  Some
"special" action would need to be taken on the OST and/or MDT to
ensure that this is what the admin wanted, instead of e.g. accidentally
inserting some other OST with the same index and corrupting the filesystem
because of duplicate object IDs, or not being able to access existing objects on
the "real" OST at that index.
>> - the new OST would be best off to start allocating objects at the
LAST_ID
>>  of the old OST, so that there is no risk of confusion between objects
>> - the MDT contains the old LAST_ID in it''s lov_objids file,
and it sends this
>>  to the OST at connection time, this is no problem
>> - currently the new OST will refuse to allow the MDT to connect,
because it
>>  detects that the old LAST_ID value from the MDT is inconsistent with
its
>>  own value
>> - it would be relatively straight forward to have the OST detect if the
local
>>  LAST_ID value was "new" and use the MDT value instead
> 
> Can we based this check on ''first_time'' flag.
> I mean, OST update its LAST_ID based on what MDT tell it only if it has the
''first_time'' flag set.
The problem is that if the ''first_time'' flag is always set on
a new OST, then any OST accidentally claiming the same index (e.g. from a test
filesystem of the same name, or from user error) could replace the valid OST. 
This ''first_time'' flag could not be the default.
>> - the danger is if the LAST_ID file was lost for some reason (e.g.
corruption
>>  causes e2fsck to erase it).  in that case, the OST startup code should
be
>>  smart enough to regenerate LAST_ID based on walking the object
directories,
>>  which would also avoid the need to do this in e2fsck/lfsck (which can
only
>>  run offline)
>> - in cases where the on-disk LAST_ID is much lower than the
MDT-supplied
>>  value, the OST should just skip precreation of all the intermediate
objects
>>  and just start using the new MDT value
> 
> This seems a different feature, even if related, which is "Better
handling of LAST_ID corruption".
Partly, yes.
>> - the only other thing is to avoid the case where a "new" OST
is accidentally
>>  assigned the same index, when that isn''t what is wanted. 
There needs to be
>>  some way to "prime" the new OST (that is NOT the default for
a newly
>>  formatted OST), or conversely tell the MDT that it should signal the
new
>>  OST to take the place of the old one, so that there are not any
mistakes
> 
> Indeed, this is important. And if we want to have this supports online
replace. Another option when formatting OST?
> --replace ? Which is only accepted when --index is set?
Yes, that would probably be a good way to handle it from the user interface. 
The other question is how to handle this internally.  Probably a flag stored in
the mountinfo or last_rcvd file.
>> Since this is something that has come up on this list a number of times
in the last year, I guess it means that a Lustre filesystem is now outliving the
hardware on which it runs, so it would definitely be worthwhile for someone to
look at this.  I filed bug 24128 on this, in case anyone wants to work on it.
> 
> Can you also add it to Community project list?
Done.

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Bob Ball

2010-Nov-10 18:01 UTC

head link

[Lustre-discuss] questions about an OST content

Well, we ran 2 days, migrating files off OST, then this morning, the MDT 
crashed.  Could not get all clients reconnected before seeing another 
kernel panic on the mdt.  did an e2fsck of the mdt db and tried again.  
crashed again, but this time the logged message is:

2010-11-10T12:40:26-05:00 lmd01.local kernel: [12307.325340] Lustre: 
6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in objids
2010-11-10T12:40:27-05:00 lmd01.local kernel: [12308.347087] Lustre: 
6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in objids

I''ve seen this message elsewhere, but can''t seem to find
anything on it
now, or what to do about it.

help?

bob

On 11/8/2010 4:27 PM, Bob Ball wrote:> Yes, you are correct.  That was the key here, did not put that file back
> in place.  Back up and (so far) operating cleanly.
>
> Thanks,
> bob
>
> On 11/8/2010 3:04 PM, Andreas Dilger wrote:
>> On 2010-11-08, at 11:39, Bob Ball wrote:
>>> Don''t know if I sent to the whole list.  One of those
days.
>>>
>>> remade the raid device, remade the lustre fs on it, but the disks
won''t mount.  Error is below.  How do I overcome this?
>>>
>>> mounting device /dev/sdc at /mnt/ost12, flags=0
options=device=/dev/sdc
>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already
in use retries left: 0
>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address already
in use
>>> The target service''s index is already in use. (/dev/sdc)
>> Looks like you didn''t copy the old
"CONFIGS/mountdata" file over the new one.  You can also use
"--writeconf" (described in the manual and several times on the list)
to have the MGS re-generate the configuration, which should fix this as well.
>>
>>> On 11/8/2010 5:01 AM, Andreas Dilger wrote:
>>>> On 2010-11-07, at 12:32, Bob Ball<ball at umich.edu>  
wrote:
>>>>> Tomorrow, we will redo all 8 OST on the first file server
we are redoing.  I am very nervous about this, as a lot is riding on us doing
this correctly.  For example, on a client now, if I umount one of the ost,
without first taking some (unknown to me) action on the MDT, then the client
will hang on the "df" command.
>>>>>
>>>>> So, while we are doing the reformat, is there any way to
avoid this "hang" situation?
>>>> If you issue "lctl --device %{OSC UUID} deactivate"
on the MDS and clients then any operations on those OSTs will immediately fail
with an IO error. If you are migrating I objects from those OSTs, I would have
imagined you already did this on the MDS or new objects would have continued to
be allocated there
>>>>
>>>>> Is the --index=XX argument to mkfs.lustre hex, or decimal? 
Seems from your comment below that this must be hex?
>>>> Decimal, though it may also accept hex (I can''t check
right now).
>>>>
>>>>> Finally, does supplying the --index even matter if we
restore the files below that you mention?  That seems to be what you are saying.
>>>> Well, you still need to set the filesystem label. This could be
done with tune2fs, but you may as well specify the right index from the
beginning.
>>>>
>>>>> On 11/6/2010 11:09 AM, Andreas Dilger wrote:
>>>>>> On 2010-11-06, at 8:24, Bob Ball<ball at
umich.edu>    wrote:
>>>>>>> I am emptying a set of OST so that I can reformat
the underlying RAID-6
>>>>>>> more efficiently.  Two questions:
>>>>>>> 1. Is there a quick way to tell if the OST is
really empty?  lfs_find
>>>>>>> takes many hours to run.
>>>>>> If you mount the OST as type ldiskfs and look in the
O/0/d* directories (capital-O, zero) there should be a few hundred zero-length
objects owned by root.
>>>>>>
>>>>>>> 2. When I reformat, I want it to retain the same ID
so as to not make
>>>>>>> "holes" in the list.  From the following
information, am I correct to
>>>>>>> assume that the id is 24?  If not, how do I
determine the correct ID to
>>>>>>> use when we re-create the file system?
>>>>>> If you still have the existing OST, the easiest way to
do this is to save the files last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy
them into the reformatted OST.
>>>>>>
>>>>>>> /dev/sdj              3.5T  3.1T  222G  94%
/mnt/ost51
>>>>>>>    10 UP obdfilter umt3-OST0018 umt3-OST0018_UUID
547
>>>>>>> umt3-OST0018_UUID           3.4T        3.0T     
221.1G  88%
>>>>>>> /lustre/umt3[OST:24]
>>>>>>>    20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID 5
>>>>>> The OST index is indeed 24 (18 hex). As for /dev/sdj,
it is hard to know from the above info. If you run "e2label /dev/sdj" 
the filesystem label should match the OST name umt3-OST0018.
>>>>>>
>>>>>> Cheers, Andreas
>>>>>>
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>>
>>
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>

Bob Ball

2010-Nov-10 19:15 UTC

head link

[Lustre-discuss] questions about an OST content

If this helps, the console shows this stuff at the kernel panic, leaving 
out most of the addresses and offsets for this "retyping"

bob

:ptlrpc:ldlm_handle_enqueue
:mds:mds_handle
:lnet:lnet_match_blocked_msg
:ptlrpc:lustre_msg_get_conn_cnt
:ptlrpc:ptlrpc_server_handle_request
__activate_task
try_to_wake_up
lock_timer_base
__mod_timer
:ptlrpc:ptlrpc_main
default_wake_function
audit_syscall_exit
child_rip
:ptlrpc:ptlrpc_main
child_rip

Code: 41 8b 14 d3 89 54 24 54 31 d2 29 c5 89 6c 24 58 0f 84 bf 00
RIP [<ffffffff88c644ef>] :ldiskfs:do_split
  RSP <ffff810422ae53b0>
CR2: ffff810acc143e38
<0> Kernel panic - not syncing: Fatal exception

On 11/10/2010 1:01 PM, Bob Ball wrote:> Well, we ran 2 days, migrating files off OST, then this morning, the MDT
> crashed.  Could not get all clients reconnected before seeing another
> kernel panic on the mdt.  did an e2fsck of the mdt db and tried again.
> crashed again, but this time the logged message is:
>
> 2010-11-10T12:40:26-05:00 lmd01.local kernel: [12307.325340] Lustre:
> 6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in objids
> 2010-11-10T12:40:27-05:00 lmd01.local kernel: [12308.347087] Lustre:
> 6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in objids
>
> I''ve seen this message elsewhere, but can''t seem to find
anything on it
> now, or what to do about it.
>
> help?
>
> bob
>
> On 11/8/2010 4:27 PM, Bob Ball wrote:
>> Yes, you are correct.  That was the key here, did not put that file
back
>> in place.  Back up and (so far) operating cleanly.
>>
>> Thanks,
>> bob
>>
>> On 11/8/2010 3:04 PM, Andreas Dilger wrote:
>>> On 2010-11-08, at 11:39, Bob Ball wrote:
>>>> Don''t know if I sent to the whole list.  One of those
days.
>>>>
>>>> remade the raid device, remade the lustre fs on it, but the
disks won''t mount.  Error is below.  How do I overcome this?
>>>>
>>>> mounting device /dev/sdc at /mnt/ost12, flags=0
options=device=/dev/sdc
>>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address
already in use retries left: 0
>>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address
already in use
>>>> The target service''s index is already in use.
(/dev/sdc)
>>> Looks like you didn''t copy the old
"CONFIGS/mountdata" file over the new one.  You can also use
"--writeconf" (described in the manual and several times on the list)
to have the MGS re-generate the configuration, which should fix this as well.
>>>
>>>> On 11/8/2010 5:01 AM, Andreas Dilger wrote:
>>>>> On 2010-11-07, at 12:32, Bob Ball<ball at umich.edu> 
wrote:
>>>>>> Tomorrow, we will redo all 8 OST on the first file
server we are redoing.  I am very nervous about this, as a lot is riding on us
doing this correctly.  For example, on a client now, if I umount one of the ost,
without first taking some (unknown to me) action on the MDT, then the client
will hang on the "df" command.
>>>>>>
>>>>>> So, while we are doing the reformat, is there any way
to avoid this "hang" situation?
>>>>> If you issue "lctl --device %{OSC UUID}
deactivate" on the MDS and clients then any operations on those OSTs will
immediately fail with an IO error. If you are migrating I objects from those
OSTs, I would have imagined you already did this on the MDS or new objects would
have continued to be allocated there
>>>>>
>>>>>> Is the --index=XX argument to mkfs.lustre hex, or
decimal?  Seems from your comment below that this must be hex?
>>>>> Decimal, though it may also accept hex (I can''t
check right now).
>>>>>
>>>>>> Finally, does supplying the --index even matter if we
restore the files below that you mention?  That seems to be what you are saying.
>>>>> Well, you still need to set the filesystem label. This
could be done with tune2fs, but you may as well specify the right index from the
beginning.
>>>>>
>>>>>> On 11/6/2010 11:09 AM, Andreas Dilger wrote:
>>>>>>> On 2010-11-06, at 8:24, Bob Ball<ball at
umich.edu>     wrote:
>>>>>>>> I am emptying a set of OST so that I can
reformat the underlying RAID-6
>>>>>>>> more efficiently.  Two questions:
>>>>>>>> 1. Is there a quick way to tell if the OST is
really empty?  lfs_find
>>>>>>>> takes many hours to run.
>>>>>>> If you mount the OST as type ldiskfs and look in
the O/0/d* directories (capital-O, zero) there should be a few hundred
zero-length objects owned by root.
>>>>>>>
>>>>>>>> 2. When I reformat, I want it to retain the
same ID so as to not make
>>>>>>>> "holes" in the list.  From the
following information, am I correct to
>>>>>>>> assume that the id is 24?  If not, how do I
determine the correct ID to
>>>>>>>> use when we re-create the file system?
>>>>>>> If you still have the existing OST, the easiest way
to do this is to save the files last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy
them into the reformatted OST.
>>>>>>>
>>>>>>>> /dev/sdj              3.5T  3.1T  222G  94%
/mnt/ost51
>>>>>>>>     10 UP obdfilter umt3-OST0018
umt3-OST0018_UUID 547
>>>>>>>> umt3-OST0018_UUID           3.4T        3.0T   
221.1G  88%
>>>>>>>> /lustre/umt3[OST:24]
>>>>>>>>     20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID
5
>>>>>>> The OST index is indeed 24 (18 hex). As for
/dev/sdj, it is hard to know from the above info. If you run "e2label
/dev/sdj"  the filesystem label should match the OST name umt3-OST0018.
>>>>>>>
>>>>>>> Cheers, Andreas
>>>>>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Lustre Technical Lead
>>> Oracle Corporation Canada Inc.
>>>
>>>
>>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
>

Andreas Dilger

2010-Nov-10 20:00 UTC

head link

[Lustre-discuss] questions about an OST content

On 2010-11-10, at 11:01, Bob Ball wrote:> Well, we ran 2 days, migrating files off OST, then this morning, the MDT 
> crashed.  Could not get all clients reconnected before seeing another 
> kernel panic on the mdt.  did an e2fsck of the mdt db and tried again.  
> crashed again, but this time the logged message is:
> 
> 2010-11-10T12:40:26-05:00 lmd01.local kernel: [12307.325340] Lustre: 
> 6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in objids
> 2010-11-10T12:40:27-05:00 lmd01.local kernel: [12308.347087] Lustre: 
> 6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in objids
> 
> I''ve seen this message elsewhere, but can''t seem to find
anything on it
> now, or what to do about it.
This might be a recovery-only problem.  Try mounting the MDS with the mount
option "-o abort_recovery".

> On 11/8/2010 4:27 PM, Bob Ball wrote:
>> Yes, you are correct.  That was the key here, did not put that file
back
>> in place.  Back up and (so far) operating cleanly.
>> 
>> Thanks,
>> bob
>> 
>> On 11/8/2010 3:04 PM, Andreas Dilger wrote:
>>> On 2010-11-08, at 11:39, Bob Ball wrote:
>>>> Don''t know if I sent to the whole list.  One of those
days.
>>>> 
>>>> remade the raid device, remade the lustre fs on it, but the
disks won''t mount.  Error is below.  How do I overcome this?
>>>> 
>>>> mounting device /dev/sdc at /mnt/ost12, flags=0
options=device=/dev/sdc
>>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address
already in use retries left: 0
>>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address
already in use
>>>> The target service''s index is already in use.
(/dev/sdc)
>>> Looks like you didn''t copy the old
"CONFIGS/mountdata" file over the new one.  You can also use
"--writeconf" (described in the manual and several times on the list)
to have the MGS re-generate the configuration, which should fix this as well.
>>> 
>>>> On 11/8/2010 5:01 AM, Andreas Dilger wrote:
>>>>> On 2010-11-07, at 12:32, Bob Ball<ball at umich.edu> 
wrote:
>>>>>> Tomorrow, we will redo all 8 OST on the first file
server we are redoing.  I am very nervous about this, as a lot is riding on us
doing this correctly.  For example, on a client now, if I umount one of the ost,
without first taking some (unknown to me) action on the MDT, then the client
will hang on the "df" command.
>>>>>> 
>>>>>> So, while we are doing the reformat, is there any way
to avoid this "hang" situation?
>>>>> If you issue "lctl --device %{OSC UUID}
deactivate" on the MDS and clients then any operations on those OSTs will
immediately fail with an IO error. If you are migrating I objects from those
OSTs, I would have imagined you already did this on the MDS or new objects would
have continued to be allocated there
>>>>> 
>>>>>> Is the --index=XX argument to mkfs.lustre hex, or
decimal?  Seems from your comment below that this must be hex?
>>>>> Decimal, though it may also accept hex (I can''t
check right now).
>>>>> 
>>>>>> Finally, does supplying the --index even matter if we
restore the files below that you mention?  That seems to be what you are saying.
>>>>> Well, you still need to set the filesystem label. This
could be done with tune2fs, but you may as well specify the right index from the
beginning.
>>>>> 
>>>>>> On 11/6/2010 11:09 AM, Andreas Dilger wrote:
>>>>>>> On 2010-11-06, at 8:24, Bob Ball<ball at
umich.edu>    wrote:
>>>>>>>> I am emptying a set of OST so that I can
reformat the underlying RAID-6
>>>>>>>> more efficiently.  Two questions:
>>>>>>>> 1. Is there a quick way to tell if the OST is
really empty?  lfs_find
>>>>>>>> takes many hours to run.
>>>>>>> If you mount the OST as type ldiskfs and look in
the O/0/d* directories (capital-O, zero) there should be a few hundred
zero-length objects owned by root.
>>>>>>> 
>>>>>>>> 2. When I reformat, I want it to retain the
same ID so as to not make
>>>>>>>> "holes" in the list.  From the
following information, am I correct to
>>>>>>>> assume that the id is 24?  If not, how do I
determine the correct ID to
>>>>>>>> use when we re-create the file system?
>>>>>>> If you still have the existing OST, the easiest way
to do this is to save the files last_rcvd, O/0/LAST_ID, and CONFIGS/*, and copy
them into the reformatted OST.
>>>>>>> 
>>>>>>>> /dev/sdj              3.5T  3.1T  222G  94%
/mnt/ost51
>>>>>>>>   10 UP obdfilter umt3-OST0018
umt3-OST0018_UUID 547
>>>>>>>> umt3-OST0018_UUID           3.4T        3.0T   
221.1G  88%
>>>>>>>> /lustre/umt3[OST:24]
>>>>>>>>   20 IN osc umt3-OST0018-osc umt3-mdtlov_UUID 5
>>>>>>> The OST index is indeed 24 (18 hex). As for
/dev/sdj, it is hard to know from the above info. If you run "e2label
/dev/sdj"  the filesystem label should match the OST name umt3-OST0018.
>>>>>>> 
>>>>>>> Cheers, Andreas
>>>>>>> 
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Lustre Technical Lead
>>> Oracle Corporation Canada Inc.
>>> 
>>> 
>>> 
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> 
>> 
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Bob Ball

2010-Nov-10 21:40 UTC

head link

[Lustre-discuss] questions about an OST content

Yes, this brought us back up (sorry, took us a while).  Clients see the 
system, and I can read and write files.  But......

What have we lost by doing this?  Can we now let it go and recover as 
usual?  What is the next step here?

bob

On 11/10/2010 3:00 PM, Andreas Dilger wrote:> On 2010-11-10, at 11:01, Bob Ball wrote:
>> Well, we ran 2 days, migrating files off OST, then this morning, the
MDT
>> crashed.  Could not get all clients reconnected before seeing another
>> kernel panic on the mdt.  did an e2fsck of the mdt db and tried again.
>> crashed again, but this time the logged message is:
>>
>> 2010-11-10T12:40:26-05:00 lmd01.local kernel: [12307.325340] Lustre:
>> 6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in objids
>> 2010-11-10T12:40:27-05:00 lmd01.local kernel: [12308.347087] Lustre:
>> 6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in objids
>>
>> I''ve seen this message elsewhere, but can''t seem to
find anything on it
>> now, or what to do about it.
> This might be a recovery-only problem.  Try mounting the MDS with the mount
option "-o abort_recovery".
>
>
>> On 11/8/2010 4:27 PM, Bob Ball wrote:
>>> Yes, you are correct.  That was the key here, did not put that file
back
>>> in place.  Back up and (so far) operating cleanly.
>>>
>>> Thanks,
>>> bob
>>>
>>> On 11/8/2010 3:04 PM, Andreas Dilger wrote:
>>>> On 2010-11-08, at 11:39, Bob Ball wrote:
>>>>> Don''t know if I sent to the whole list.  One of
those days.
>>>>>
>>>>> remade the raid device, remade the lustre fs on it, but the
disks won''t mount.  Error is below.  How do I overcome this?
>>>>>
>>>>> mounting device /dev/sdc at /mnt/ost12, flags=0
options=device=/dev/sdc
>>>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address
already in use retries left: 0
>>>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed: Address
already in use
>>>>> The target service''s index is already in use.
(/dev/sdc)
>>>> Looks like you didn''t copy the old
"CONFIGS/mountdata" file over the new one.  You can also use
"--writeconf" (described in the manual and several times on the list)
to have the MGS re-generate the configuration, which should fix this as well.
>>>>
>>>>> On 11/8/2010 5:01 AM, Andreas Dilger wrote:
>>>>>> On 2010-11-07, at 12:32, Bob Ball<ball at
umich.edu>    wrote:
>>>>>>> Tomorrow, we will redo all 8 OST on the first file
server we are redoing.  I am very nervous about this, as a lot is riding on us
doing this correctly.  For example, on a client now, if I umount one of the ost,
without first taking some (unknown to me) action on the MDT, then the client
will hang on the "df" command.
>>>>>>>
>>>>>>> So, while we are doing the reformat, is there any
way to avoid this "hang" situation?
>>>>>> If you issue "lctl --device %{OSC UUID}
deactivate" on the MDS and clients then any operations on those OSTs will
immediately fail with an IO error. If you are migrating I objects from those
OSTs, I would have imagined you already did this on the MDS or new objects would
have continued to be allocated there
>>>>>>
>>>>>>> Is the --index=XX argument to mkfs.lustre hex, or
decimal?  Seems from your comment below that this must be hex?
>>>>>> Decimal, though it may also accept hex (I
can''t check right now).
>>>>>>
>>>>>>> Finally, does supplying the --index even matter if
we restore the files below that you mention?  That seems to be what you are
saying.
>>>>>> Well, you still need to set the filesystem label. This
could be done with tune2fs, but you may as well specify the right index from the
beginning.
>>>>>>
>>>>>>> On 11/6/2010 11:09 AM, Andreas Dilger wrote:
>>>>>>>> On 2010-11-06, at 8:24, Bob Ball<ball at
umich.edu>     wrote:
>>>>>>>>> I am emptying a set of OST so that I can
reformat the underlying RAID-6
>>>>>>>>> more efficiently.  Two questions:
>>>>>>>>> 1. Is there a quick way to tell if the OST
is really empty?  lfs_find
>>>>>>>>> takes many hours to run.
>>>>>>>> If you mount the OST as type ldiskfs and look
in the O/0/d* directories (capital-O, zero) there should be a few hundred
zero-length objects owned by root.
>>>>>>>>
>>>>>>>>> 2. When I reformat, I want it to retain the
same ID so as to not make
>>>>>>>>> "holes" in the list.  From the
following information, am I correct to
>>>>>>>>> assume that the id is 24?  If not, how do I
determine the correct ID to
>>>>>>>>> use when we re-create the file system?
>>>>>>>> If you still have the existing OST, the easiest
way to do this is to save the files last_rcvd, O/0/LAST_ID, and CONFIGS/*, and
copy them into the reformatted OST.
>>>>>>>>
>>>>>>>>> /dev/sdj              3.5T  3.1T  222G  94%
/mnt/ost51
>>>>>>>>>    10 UP obdfilter umt3-OST0018
umt3-OST0018_UUID 547
>>>>>>>>> umt3-OST0018_UUID           3.4T       
3.0T      221.1G  88%
>>>>>>>>> /lustre/umt3[OST:24]
>>>>>>>>>    20 IN osc umt3-OST0018-osc
umt3-mdtlov_UUID 5
>>>>>>>> The OST index is indeed 24 (18 hex). As for
/dev/sdj, it is hard to know from the above info. If you run "e2label
/dev/sdj"  the filesystem label should match the OST name umt3-OST0018.
>>>>>>>>
>>>>>>>> Cheers, Andreas
>>>>>>>>
>>>> Cheers, Andreas
>>>> --
>>>> Andreas Dilger
>>>> Lustre Technical Lead
>>>> Oracle Corporation Canada Inc.
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>
>>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>
> Cheers, Andreas
> --
> Andreas Dilger
> Lustre Technical Lead
> Oracle Corporation Canada Inc.
>
>
>

Andreas Dilger

2010-Nov-10 22:00 UTC

head link

[Lustre-discuss] questions about an OST content

On 2010-11-10, at 14:40, Bob Ball wrote:> Yes, this brought us back up (sorry, took us a while).  Clients see the
system, and I can read and write files.  But......
> 
> What have we lost by doing this?  Can we now let it go and recover as
usual?  What is the next step here?
The abort_recovery option evicted all of the clients, so any of their
in-progress operations would have failed.  They have all since reconnected and
no action is needed.
> On 11/10/2010 3:00 PM, Andreas Dilger wrote:
>> On 2010-11-10, at 11:01, Bob Ball wrote:
>>> Well, we ran 2 days, migrating files off OST, then this morning,
the MDT
>>> crashed.  Could not get all clients reconnected before seeing
another
>>> kernel panic on the mdt.  did an e2fsck of the mdt db and tried
again.
>>> crashed again, but this time the logged message is:
>>> 
>>> 2010-11-10T12:40:26-05:00 lmd01.local kernel: [12307.325340]
Lustre:
>>> 6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in
objids
>>> 2010-11-10T12:40:27-05:00 lmd01.local kernel: [12308.347087]
Lustre:
>>> 6243:0:(mds_lov.c:330:mds_lov_update_objids()) Unexpected gap in
objids
>>> 
>>> I''ve seen this message elsewhere, but can''t seem
to find anything on it
>>> now, or what to do about it.
>> 
>> This might be a recovery-only problem.  Try mounting the MDS with the
mount option "-o abort_recovery".
>> 
>>> On 11/8/2010 4:27 PM, Bob Ball wrote:
>>>> Yes, you are correct.  That was the key here, did not put that
file back
>>>> in place.  Back up and (so far) operating cleanly.
>>>> 
>>>> Thanks,
>>>> bob
>>>> 
>>>> On 11/8/2010 3:04 PM, Andreas Dilger wrote:
>>>>> On 2010-11-08, at 11:39, Bob Ball wrote:
>>>>>> Don''t know if I sent to the whole list.  One
of those days.
>>>>>> 
>>>>>> remade the raid device, remade the lustre fs on it, but
the disks won''t mount.  Error is below.  How do I overcome this?
>>>>>> 
>>>>>> mounting device /dev/sdc at /mnt/ost12, flags=0
options=device=/dev/sdc
>>>>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed:
Address already in use retries left: 0
>>>>>> mount.lustre: mount /dev/sdc at /mnt/ost12 failed:
Address already in use
>>>>>> The target service''s index is already in use.
(/dev/sdc)
>>>>> Looks like you didn''t copy the old
"CONFIGS/mountdata" file over the new one.  You can also use
"--writeconf" (described in the manual and several times on the list)
to have the MGS re-generate the configuration, which should fix this as well.
>>>>> 
>>>>>> On 11/8/2010 5:01 AM, Andreas Dilger wrote:
>>>>>>> On 2010-11-07, at 12:32, Bob Ball<ball at
umich.edu>    wrote:
>>>>>>>> Tomorrow, we will redo all 8 OST on the first
file server we are redoing.  I am very nervous about this, as a lot is riding on
us doing this correctly.  For example, on a client now, if I umount one of the
ost, without first taking some (unknown to me) action on the MDT, then the
client will hang on the "df" command.
>>>>>>>> 
>>>>>>>> So, while we are doing the reformat, is there
any way to avoid this "hang" situation?
>>>>>>> If you issue "lctl --device %{OSC UUID}
deactivate" on the MDS and clients then any operations on those OSTs will
immediately fail with an IO error. If you are migrating I objects from those
OSTs, I would have imagined you already did this on the MDS or new objects would
have continued to be allocated there
>>>>>>> 
>>>>>>>> Is the --index=XX argument to mkfs.lustre hex,
or decimal?  Seems from your comment below that this must be hex?
>>>>>>> Decimal, though it may also accept hex (I
can''t check right now).
>>>>>>> 
>>>>>>>> Finally, does supplying the --index even matter
if we restore the files below that you mention?  That seems to be what you are
saying.
>>>>>>> Well, you still need to set the filesystem label.
This could be done with tune2fs, but you may as well specify the right index
from the beginning.
>>>>>>> 
>>>>>>>> On 11/6/2010 11:09 AM, Andreas Dilger wrote:
>>>>>>>>> On 2010-11-06, at 8:24, Bob Ball<ball at
umich.edu>     wrote:
>>>>>>>>>> I am emptying a set of OST so that I
can reformat the underlying RAID-6
>>>>>>>>>> more efficiently.  Two questions:
>>>>>>>>>> 1. Is there a quick way to tell if the
OST is really empty?  lfs_find
>>>>>>>>>> takes many hours to run.
>>>>>>>>> If you mount the OST as type ldiskfs and
look in the O/0/d* directories (capital-O, zero) there should be a few hundred
zero-length objects owned by root.
>>>>>>>>> 
>>>>>>>>>> 2. When I reformat, I want it to retain
the same ID so as to not make
>>>>>>>>>> "holes" in the list.  From
the following information, am I correct to
>>>>>>>>>> assume that the id is 24?  If not, how
do I determine the correct ID to
>>>>>>>>>> use when we re-create the file system?
>>>>>>>>> If you still have the existing OST, the
easiest way to do this is to save the files last_rcvd, O/0/LAST_ID, and
CONFIGS/*, and copy them into the reformatted OST.
>>>>>>>>> 
>>>>>>>>>> /dev/sdj              3.5T  3.1T  222G 
94% /mnt/ost51
>>>>>>>>>>   10 UP obdfilter umt3-OST0018
umt3-OST0018_UUID 547
>>>>>>>>>> umt3-OST0018_UUID           3.4T       
3.0T      221.1G  88%
>>>>>>>>>> /lustre/umt3[OST:24]
>>>>>>>>>>   20 IN osc umt3-OST0018-osc
umt3-mdtlov_UUID 5
>>>>>>>>> The OST index is indeed 24 (18 hex). As for
/dev/sdj, it is hard to know from the above info. If you run "e2label
/dev/sdj"  the filesystem label should match the OST name umt3-OST0018.
>>>>>>>>> 
>>>>>>>>> Cheers, Andreas
>>>>>>>>> 
>>>>> Cheers, Andreas
>>>>> --
>>>>> Andreas Dilger
>>>>> Lustre Technical Lead
>>>>> Oracle Corporation Canada Inc.
>>>>> 
>>>>> 
>>>>> 
>>>> _______________________________________________
>>>> Lustre-discuss mailing list
>>>> Lustre-discuss at lists.lustre.org
>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>>> 
>>>> 
>>> _______________________________________________
>>> Lustre-discuss mailing list
>>> Lustre-discuss at lists.lustre.org
>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>> 
>> Cheers, Andreas
>> --
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>> 
>> 
>> 

Cheers, Andreas
--
Andreas Dilger
Lustre Technical Lead
Oracle Corporation Canada Inc.

Lustre discuss - Nov 2010 - questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

Re: questions about an OST content

Re: questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content

[Lustre-discuss] questions about an OST content