thr3ads.net - Lustre discuss - [Lustre-discuss] missing ost''s? [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Michael Di Domenico

2009-Jun-17 00:25 UTC

[Lustre-discuss] missing ost''s?

I have a small lustre test cluster with eight OST''s running.  The
servers were shut off over the weekend, upon turning them back on and
trying to startup lustre I seem to have lost my OST''s.

[root at node1 ~]$ lctl dl
  0 UP mgs MGS MGS 19
  1 UP mgc MGC192.168.1.254 at tcp 8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
  4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 3
  5 UP ost OSS OSS_uuid 3
  6 UP obdfilter lustre-OST0000 lustre-OST0000_UUID 3

Everything in the messages log appears to be fine as if it was just a
normal startup of lustre, except for the below message.  I''m not sure
what logfile the error is referring to, and the message gives little
detail on where i should start looking for an error.

Jun 16 20:13:55 node1-eth0 kernel: LustreError:
3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation
CONFIGS/lustre-MDT0000T: -28
Jun 16 20:13:55 node1-eth0 kernel: LustreError:
3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote log
lustre-MDT0000 (-28)

Can anyone give me an idea on what happened?

Thanks

Michael Di Domenico

2009-Jun-17 00:58 UTC

head link

[Lustre-discuss] missing ost''s?

On Tue, Jun 16, 2009 at 8:25 PM, Michael Di
Domenico<mdidomenico4 at gmail.com> wrote:> I have a small lustre test cluster with eight OST''s running. ?The
> servers were shut off over the weekend, upon turning them back on and
> trying to startup lustre I seem to have lost my OST''s.
>
> [root at node1 ~]$ lctl dl
> ?0 UP mgs MGS MGS 19
> ?1 UP mgc MGC192.168.1.254 at tcp 8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957 5
> ?2 UP mdt MDS MDS_uuid 3
> ?3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
> ?4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 3
> ?5 UP ost OSS OSS_uuid 3
> ?6 UP obdfilter lustre-OST0000 lustre-OST0000_UUID 3
>
> Everything in the messages log appears to be fine as if it was just a
> normal startup of lustre, except for the below message. ?I''m not
sure
> what logfile the error is referring to, and the message gives little
> detail on where i should start looking for an error.
>
> Jun 16 20:13:55 node1-eth0 kernel: LustreError:
> 3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation
> CONFIGS/lustre-MDT0000T: -28
> Jun 16 20:13:55 node1-eth0 kernel: LustreError:
> 3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote log
> lustre-MDT0000 (-28)
Apparently from the lustre manual the -28 at the end of the line is an
error code, which points to

-28 -ENOSPC The file system is out-of-space or out of inodes. Use lfs df
(query the amount of file system space) or lfs df -i
(query the number of inodes).

verified by

[root at node1 ~]$ df -i
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/md2             1280000   42132 1237868    4% /
/dev/md0              255232      45  255187    1% /boot
tmpfs                 124645       1  124644    1% /dev/shm
/dev/md3               63872      24   63848    1% /mgs
/dev/md4              255040  255040       0  100% /mdt
/dev/md5             29892608   28726 29863882    1% /ost

I only put 500k files in the filesystem i would not have thought the
mdt would have used up the inodes that fast

Mag Gam

2009-Jun-17 03:08 UTC

head link

[Lustre-discuss] missing ost''s?

do you have many small files?



On Tue, Jun 16, 2009 at 8:58 PM, Michael Di
Domenico<mdidomenico4 at gmail.com> wrote:> On Tue, Jun 16, 2009 at 8:25 PM, Michael Di
> Domenico<mdidomenico4 at gmail.com> wrote:
>> I have a small lustre test cluster with eight OST''s running.
?The
>> servers were shut off over the weekend, upon turning them back on and
>> trying to startup lustre I seem to have lost my OST''s.
>>
>> [root at node1 ~]$ lctl dl
>> ?0 UP mgs MGS MGS 19
>> ?1 UP mgc MGC192.168.1.254 at tcp 8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957
5
>> ?2 UP mdt MDS MDS_uuid 3
>> ?3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
>> ?4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 3
>> ?5 UP ost OSS OSS_uuid 3
>> ?6 UP obdfilter lustre-OST0000 lustre-OST0000_UUID 3
>>
>> Everything in the messages log appears to be fine as if it was just a
>> normal startup of lustre, except for the below message. ?I''m
not sure
>> what logfile the error is referring to, and the message gives little
>> detail on where i should start looking for an error.
>>
>> Jun 16 20:13:55 node1-eth0 kernel: LustreError:
>> 3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation
>> CONFIGS/lustre-MDT0000T: -28
>> Jun 16 20:13:55 node1-eth0 kernel: LustreError:
>> 3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote log
>> lustre-MDT0000 (-28)
>
> Apparently from the lustre manual the -28 at the end of the line is an
> error code, which points to
>
> -28 -ENOSPC The file system is out-of-space or out of inodes. Use lfs df
> (query the amount of file system space) or lfs df -i
> (query the number of inodes).
>
> verified by
>
> [root at node1 ~]$ df -i
> Filesystem ? ? ? ? ? ?Inodes ? IUsed ? IFree IUse% Mounted on
> /dev/md2 ? ? ? ? ? ? 1280000 ? 42132 1237868 ? ?4% /
> /dev/md0 ? ? ? ? ? ? ?255232 ? ? ?45 ?255187 ? ?1% /boot
> tmpfs ? ? ? ? ? ? ? ? 124645 ? ? ? 1 ?124644 ? ?1% /dev/shm
> /dev/md3 ? ? ? ? ? ? ? 63872 ? ? ?24 ? 63848 ? ?1% /mgs
> /dev/md4 ? ? ? ? ? ? ?255040 ?255040 ? ? ? 0 ?100% /mdt
> /dev/md5 ? ? ? ? ? ? 29892608 ? 28726 29863882 ? ?1% /ost
>
> I only put 500k files in the filesystem i would not have thought the
> mdt would have used up the inodes that fast
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>

Timh Bergström

2009-Jun-17 05:26 UTC

head link

[Lustre-discuss] missing ost''s?

As long as the inode-discussion is up, two questions; what exactly is
stored in the inode (how big should i make them) I''ve read the manual
about this and it doesnt really say except the notation about
stripes/osts.

Is there a "proper" way of "moving" or recreating the
mdt-filesystem
to hold more inodes or is it backup -> reformat -> restore procedure
that is the proper way?

Sorry to hijack your thread.

Regards,
Timh

2009/6/17 Mag Gam <magawake at gmail.com>:> do you have many small files?
>
>
>
> On Tue, Jun 16, 2009 at 8:58 PM, Michael Di
> Domenico<mdidomenico4 at gmail.com> wrote:
>> On Tue, Jun 16, 2009 at 8:25 PM, Michael Di
>> Domenico<mdidomenico4 at gmail.com> wrote:
>>> I have a small lustre test cluster with eight OST''s
running. ?The
>>> servers were shut off over the weekend, upon turning them back on
and
>>> trying to startup lustre I seem to have lost my OST''s.
>>>
>>> [root at node1 ~]$ lctl dl
>>> ?0 UP mgs MGS MGS 19
>>> ?1 UP mgc MGC192.168.1.254 at tcp
8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957 5
>>> ?2 UP mdt MDS MDS_uuid 3
>>> ?3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
>>> ?4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 3
>>> ?5 UP ost OSS OSS_uuid 3
>>> ?6 UP obdfilter lustre-OST0000 lustre-OST0000_UUID 3
>>>
>>> Everything in the messages log appears to be fine as if it was just
a
>>> normal startup of lustre, except for the below message.
?I''m not sure
>>> what logfile the error is referring to, and the message gives
little
>>> detail on where i should start looking for an error.
>>>
>>> Jun 16 20:13:55 node1-eth0 kernel: LustreError:
>>> 3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation
>>> CONFIGS/lustre-MDT0000T: -28
>>> Jun 16 20:13:55 node1-eth0 kernel: LustreError:
>>> 3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote
log
>>> lustre-MDT0000 (-28)
>>
>> Apparently from the lustre manual the -28 at the end of the line is an
>> error code, which points to
>>
>> -28 -ENOSPC The file system is out-of-space or out of inodes. Use lfs
df
>> (query the amount of file system space) or lfs df -i
>> (query the number of inodes).
>>
>> verified by
>>
>> [root at node1 ~]$ df -i
>> Filesystem ? ? ? ? ? ?Inodes ? IUsed ? IFree IUse% Mounted on
>> /dev/md2 ? ? ? ? ? ? 1280000 ? 42132 1237868 ? ?4% /
>> /dev/md0 ? ? ? ? ? ? ?255232 ? ? ?45 ?255187 ? ?1% /boot
>> tmpfs ? ? ? ? ? ? ? ? 124645 ? ? ? 1 ?124644 ? ?1% /dev/shm
>> /dev/md3 ? ? ? ? ? ? ? 63872 ? ? ?24 ? 63848 ? ?1% /mgs
>> /dev/md4 ? ? ? ? ? ? ?255040 ?255040 ? ? ? 0 ?100% /mdt
>> /dev/md5 ? ? ? ? ? ? 29892608 ? 28726 29863882 ? ?1% /ost
>>
>> I only put 500k files in the filesystem i would not have thought the
>> mdt would have used up the inodes that fast
>> _______________________________________________
>> Lustre-discuss mailing list
>> Lustre-discuss at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>>
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss
>


-- 
Timh Bergstr?m
System Operations Manager
Diino AB - www.diino.com
:wq

Michael Di Domenico

2009-Jun-17 11:05 UTC

head link

[Lustre-discuss] missing ost''s?

On Tue, Jun 16, 2009 at 11:08 PM, Mag Gam<magawake at gmail.com>
wrote:> do you have many small files?
There was a mix of small vs medium size.  I reread the "Sizing MDT"
section in the manual and see my error.  That section should be in big
bold letters at the very beginning... :)

Michael Di Domenico

2009-Jun-17 11:06 UTC

head link

[Lustre-discuss] missing ost''s?

2009/6/17 Timh Bergstr?m <timh.bergstrom at
diino.net>:> As long as the inode-discussion is up, two questions; what exactly is
> stored in the inode (how big should i make them) I''ve read the
manual
> about this and it doesnt really say except the notation about
> stripes/osts.
>
> Is there a "proper" way of "moving" or recreating the
mdt-filesystem
> to hold more inodes or is it backup -> reformat -> restore procedure
> that is the proper way?
>
> Sorry to hijack your thread.

Its okay.  I have roughly the same question.  In my current case, the
filesystem is only a test so i can just recreate it, but i can see
this happening in production, so preparing for it not to happen i can
do, but users are unpredictable...

Cliff White

2009-Jun-17 19:33 UTC

head link

[Lustre-discuss] missing ost''s?

Michael Di Domenico wrote:> On Tue, Jun 16, 2009 at 8:25 PM, Michael Di
> Domenico<mdidomenico4 at gmail.com> wrote:
>> I have a small lustre test cluster with eight OST''s running. 
The
>> servers were shut off over the weekend, upon turning them back on and
>> trying to startup lustre I seem to have lost my OST''s.
>>
>> [root at node1 ~]$ lctl dl
>>  0 UP mgs MGS MGS 19
>>  1 UP mgc MGC192.168.1.254 at tcp 8acd9bf1-d1ca-8e26-1fad-bd2cf88a2957
5
>>  2 UP mdt MDS MDS_uuid 3
>>  3 UP lov lustre-mdtlov lustre-mdtlov_UUID 4
>>  4 UP mds lustre-MDT0000 lustre-MDT0000_UUID 3
>>  5 UP ost OSS OSS_uuid 3
>>  6 UP obdfilter lustre-OST0000 lustre-OST0000_UUID 3
>>
>> Everything in the messages log appears to be fine as if it was just a
>> normal startup of lustre, except for the below message.  I''m
not sure
>> what logfile the error is referring to, and the message gives little
>> detail on where i should start looking for an error.
>>
>> Jun 16 20:13:55 node1-eth0 kernel: LustreError:
>> 3106:0:(llog_lvfs.c:577:llog_filp_open()) logfile creation
>> CONFIGS/lustre-MDT0000T: -28
>> Jun 16 20:13:55 node1-eth0 kernel: LustreError:
>> 3106:0:(mgc_request.c:1086:mgc_copy_llog()) Failed to copy remote log
>> lustre-MDT0000 (-28)
> 
> Apparently from the lustre manual the -28 at the end of the line is an
> error code, which points to
> 
> -28 -ENOSPC The file system is out-of-space or out of inodes. Use lfs df
> (query the amount of file system space) or lfs df -i
> (query the number of inodes).
> 
> verified by
> 
> [root at node1 ~]$ df -i
> Filesystem            Inodes   IUsed   IFree IUse% Mounted on
> /dev/md2             1280000   42132 1237868    4% /
> /dev/md0              255232      45  255187    1% /boot
> tmpfs                 124645       1  124644    1% /dev/shm
> /dev/md3               63872      24   63848    1% /mgs
> /dev/md4              255040  255040       0  100% /mdt
> /dev/md5             29892608   28726 29863882    1% /ost
> 
> I only put 500k files in the filesystem i would not have thought the
> mdt would have used up the inodes that fast
The MDT will consume one inode for each file in the global Lustre file 
system. You have plenty of OST space, but no inodes.

You have 255K inodes on the MDS, but you are trying to create 500k files.

cliffw
> _______________________________________________
> Lustre-discuss mailing list
> Lustre-discuss at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-discuss

Lustre discuss - Jun 2009 - missing ost's?

[Lustre-discuss] missing ost''s?

[Lustre-discuss] missing ost''s?

[Lustre-discuss] missing ost''s?

[Lustre-discuss] missing ost''s?

[Lustre-discuss] missing ost''s?

[Lustre-discuss] missing ost''s?

[Lustre-discuss] missing ost''s?