thr3ads.net - Lustre discuss - [Lustre-discuss] recovery after OSS failure, error -22 [Aug 2007]

If this information is useful, please help other people find it:
Share via:

Thomas Roth

2007-Aug-08 09:45 UTC

[Lustre-discuss] recovery after OSS failure, error -22

Hi all,

after failure of a server contributing two OSTs to our lustre fs, I''m
having trouble either getting rid of these OSTs for good or
re-introducing them. (It''s a test system, the data on it may be thrown
away anytime if necessary). System is running Debian Etch, kernel
2.6.20, Lustre 1.6.0.1

Trying to mount the OSTs invariably gives me

 kernel: LustreError: Trying to start OBD testfs1-OST000b_UUID using the
wrong disk . Were the /dev/ assignments rearranged?
 kernel: LustreError: 7792:0:(filter.c:1008:filter_prep()) cannot read
last_rcvd: rc = -22
...
the log messages following these lines are a consequence of these, I guess.
Although I''m not sure what may be  /dev/ assignments, nothing has been
changed on this machine - just a reboot (and maybe a damaged partition,
of course)
I also havn''t found the meaning of the code -22 ?

I went on trying to unregister these OSTs on the MGS. The Lustre manual
says
$ mgs> lctl conf_param testfs-OST0001.osc.active=0
This doesn''t work, as do most of the examples given in
http://manual.lustre.org/manual/LustreManual16_HTML - for which Lustre
version was this manual written? ''man lctl'' tells me that the
--device
option may be missing. On the MGS, I got
$ mgs> lctl dl
...
 19 UP osc testfs1-OST000a-osc testfs1-mdtlov_UUID 5
 20 UP osc testfs1-OST000b-osc testfs1-mdtlov_UUID 5

(Something else that I''m missing painfully in all the Lustre
documentation: explanation of output of commands!)
My guess was the correct name for my OSTs is given in the fourth field,
so I tried
$ mgs> lctl --device testfs1-OST000a-osc conf_param
testfs1-OST000b.osc.active=0

This at least didn''t give me an error. The output of ''lctl
dl'' did not
change, however., 19 and 20 still there and UP.

$ mgs> lctl --device testfs1-OST000a-osc deactivate
had the same result.

Still, I went on to the OSS and tried
$ oss> tunefs.lustre --erase-params --fsname=testfs1  --ost
--mgsnode=MGS@tcp0   /dev/sdb1
which doesn''t work because of
tunefs.lustre: cannot change the name of a registered target
tunefs.lustre: exiting with 1 (Operation not permitted)

$ oss> tunefs.lustre --writeconf --erase-params --fsname=testfs1  --ost
--mgsnode=MGS@tcp0 /dev/sdb1
works fine, but mounting the partition results in exactly the same error
messages in the syslog as before.

So far I have not tried reformatting these partitions. But I think I
should ask the experts here about all the mistakes I made.

Many thanks.
Thomas



-- 
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 2126  Fax: +49-6159-71 2986

Gesellschaft f?r Schwerionenforschung mbH
Planckstra?e 1
D-64291 Darmstadt
www.gsi.de

Gesellschaft mit beschr?nkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528

Gesch?ftsf?hrer: Professor Dr. Walter F. Henning, Dr. Alexander Kurz

Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph,
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt

Nathaniel Rutman

2007-Aug-08 16:10 UTC

head link

[Lustre-discuss] recovery after OSS failure, error -22

Thomas Roth wrote:> Hi all,
>
> after failure of a server contributing two OSTs to our lustre fs,
I''m
> having trouble either getting rid of these OSTs for good or
> re-introducing them. (It''s a test system, the data on it may be
thrown
> away anytime if necessary). System is running Debian Etch, kernel
> 2.6.20, Lustre 1.6.0.1
>
> Trying to mount the OSTs invariably gives me
>
>  kernel: LustreError: Trying to start OBD testfs1-OST000b_UUID using the
> wrong disk . Were the /dev/ assignments rearranged?
>  kernel: LustreError: 7792:0:(filter.c:1008:filter_prep()) cannot read
> last_rcvd: rc = -22
> ...
>   "the wrong disk ."  -- the missing disk name implies the last_rcvd
file
has been corrupted.
(The -22 EINVAL is a consequence of that.)  You could try mounting the 
disk as type ldiskfs, then erasing the last_rcvd file - this should 
cause the OST to regenerate it.
> the log messages following these lines are a consequence of these, I guess.
> Although I''m not sure what may be  /dev/ assignments, nothing has
been
> changed on this machine - just a reboot (and maybe a damaged partition,
> of course)
> I also havn''t found the meaning of the code -22 ?
>
> I went on trying to unregister these OSTs on the MGS. The Lustre manual
> says
> $ mgs> lctl conf_param testfs-OST0001.osc.active=0
> This doesn''t work, as do most of the examples given in
> http://manual.lustre.org/manual/LustreManual16_HTML - for which Lustre
> version was this manual written? ''man lctl'' tells me that
the --device
> option may be missing. On the MGS, I got
> $ mgs> lctl dl
> ...
>  19 UP osc testfs1-OST000a-osc testfs1-mdtlov_UUID 5
>  20 UP osc testfs1-OST000b-osc testfs1-mdtlov_UUID 5
>
> (Something else that I''m missing painfully in all the Lustre
> documentation: explanation of output of commands!)
> My guess was the correct name for my OSTs is given in the fourth field,
> so I tried
> $ mgs> lctl --device testfs1-OST000a-osc conf_param
> testfs1-OST000b.osc.active=0
>
> This at least didn''t give me an error. The output of
''lctl dl'' did not
> change, however., 19 and 20 still there and UP.
>
> $ mgs> lctl --device testfs1-OST000a-osc deactivate
> had the same result.
>
> Still, I went on to the OSS and tried
> $ oss> tunefs.lustre --erase-params --fsname=testfs1  --ost
> --mgsnode=MGS@tcp0   /dev/sdb1
> which doesn''t work because of
> tunefs.lustre: cannot change the name of a registered target
> tunefs.lustre: exiting with 1 (Operation not permitted)
>
> $ oss> tunefs.lustre --writeconf --erase-params --fsname=testfs1  --ost
> --mgsnode=MGS@tcp0 /dev/sdb1
> works fine, but mounting the partition results in exactly the same error
> messages in the syslog as before.
>
> So far I have not tried reformatting these partitions. But I think I
> should ask the experts here about all the mistakes I made.
>
> Many thanks.
> Thomas
>
>
>
>

Lustre discuss - Aug 2007 - recovery after OSS failure, error -22

[Lustre-discuss] recovery after OSS failure, error -22

[Lustre-discuss] recovery after OSS failure, error -22