thr3ads.net - Lustre discuss - [Lustre-discuss] big problem: read-only fs [Sep 2008]

If this information is useful, please help other people find it:
Share via:

Papp Tamás

2008-Sep-16 16:47 UTC

[Lustre-discuss] big problem: read-only fs

hi All,

This morning we see on some client, it cannot connect to one of our node.

I run fsck on the node, and remounted it. Fsck found a lot of errors.


After this I see this on the logs again:

Sep 16 18:16:08 node1 kernel: LustreError: 
2538:0:(ldlm_resource.c:719:ldlm_resource_add()) lvbo_init failed for 
resource 80132: rc -2
Sep 16 18:16:08 node1 kernel: LustreError: 
2538:0:(ldlm_resource.c:719:ldlm_resource_add()) Skipped 15 previous 
similar messages
Sep 16 18:27:15 node1 kernel: LustreError: 
2487:0:(ldlm_resource.c:719:ldlm_resource_add()) lvbo_init failed for 
resource 232490: rc -2
Sep 16 18:27:15 node1 kernel: LustreError: 
2487:0:(ldlm_resource.c:719:ldlm_resource_add()) Skipped 16 previous 
similar messages
Sep 16 18:30:08 node1 kernel: LDISKFS-fs error (device sdb1): 
ldiskfs_ext_find_extent: bad header in inode #58262056: invalid magic - 
magic 0, entries 0, max 0(0), depth 0(0)
Sep 16 18:30:08 node1 kernel: Remounting filesystem read-only
Sep 16 18:30:16 node1 kernel: Lustre: Skipped 1 previous similar message
Sep 16 18:30:16 node1 kernel: LustreError: 
3986:0:(fsfilt-ldiskfs.c:1318:fsfilt_ldiskfs_write_record()) can''t
start
transaction for 37 blocks (128 bytes)
Sep 16 18:30:16 node1 kernel: LustreError: 
3986:0:(fsfilt-ldiskfs.c:1318:fsfilt_ldiskfs_write_record()) Skipped 53 
previous similar messages
Sep 16 18:30:16 node1 kernel: LustreError: 
3986:0:(filter.c:360:filter_client_free()) zeroing out client 
2bee00d4-c421-4c8e-bf27-8dd131e0bc55 at idx 51 (14720) in last_rcvd rc -30
Sep 16 18:34:32 node1 kernel: LustreError: 
2426:0:(fsfilt-ldiskfs.c:281:fsfilt_ldiskfs_start()) error starting 
handle for op 8 (71 credits): rc -30
Sep 16 18:34:32 node1 kernel: LustreError: 
2426:0:(fsfilt-ldiskfs.c:281:fsfilt_ldiskfs_start()) Skipped 36 previous 
similar messages
Sep 16 18:37:16 node1 kernel: LustreError: 
2416:0:(ldlm_resource.c:719:ldlm_resource_add()) lvbo_init failed for 
resource 212576: rc -2
Sep 16 18:37:16 node1 kernel: LustreError: 
2416:0:(ldlm_resource.c:719:ldlm_resource_add()) Skipped 15 previous 
similar messages
Sep 16 18:43:20 node1 kernel: LustreError: 
2341:0:(fsfilt-ldiskfs.c:281:fsfilt_ldiskfs_start()) error starting 
handle for op 8 (71 credits): rc -30
Sep 16 18:43:20 node1 kernel: LustreError: 
2341:0:(filter.c:273:filter_client_add()) unable to start transaction: 
rc -30
Sep 16 18:43:20 node1 kernel: LustreError: 
2341:0:(filter.c:273:filter_client_add()) Skipped 36 previous similar 
messages
Sep 16 18:43:20 node1 kernel: LustreError: 
2341:0:(filter.c:294:filter_client_add()) error writing last_rcvd client 
idx 34: rc -30
Sep 16 18:43:20 node1 kernel: LustreError: 
2341:0:(filter.c:294:filter_client_add()) Skipped 36 previous similar 
messages
Sep 16 18:43:20 node1 kernel: LustreError: 
2341:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error 
(-30)  req at ffff81000276de00 x8/t0 o8-><?>@<?>:-1 lens 240/144
ref 0 fl
Interpret:/0/0 rc -30/0
Sep 16 18:43:21 node1 kernel: LustreError: 
2341:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 5 previous 
similar messages
Sep 16 18:44:10 node1 kernel: LustreError: 
2399:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error 
(-30)  req at ffff81005f5d0400 x38/t0 o8-><?>@<?>:-1 lens 240/144
ref 0 fl
Interpret:/0/0 rc -30/0
Sep 16 18:44:10 node1 kernel: LustreError: 
2399:0:(ldlm_lib.c:1442:target_send_reply_msg()) Skipped 1 previous 
similar message
Sep 16 18:44:35 node1 kernel: LustreError: 
2341:0:(filter.c:273:filter_client_add()) unable to start transaction: 
rc -30
Sep 16 18:44:35 node1 kernel: LustreError: 
2341:0:(filter.c:273:filter_client_add()) Skipped 2 previous similar 
messages
Sep 16 18:44:35 node1 kernel: LustreError: 
2341:0:(filter.c:294:filter_client_add()) error writing last_rcvd client 
idx 51: rc -30
Sep 16 18:44:35 node1 kernel: LustreError: 
2341:0:(filter.c:294:filter_client_add()) Skipped 2 previous similar 
messages

errno -2 was right after fsck, it''s OK.

But why does -30 is here? I hoped, it will disappear after fsck, but I 
see again. What could cause this problem? How can I solve it?

Thank you,

tamas

Bernd Schubert

2008-Sep-16 16:54 UTC

head link

[Lustre-discuss] big problem: read-only fs

On Tuesday 16 September 2008 18:47:43 Papp Tam?s wrote:> hi All,
>
> This morning we see on some client, it cannot connect to one of our node.
>
> I run fsck on the node, and remounted it. Fsck found a lot of errors.
>
>
> After this I see this on the logs again:
[...]
>
> Sep 16 18:30:08 node1 kernel: LDISKFS-fs error (device sdb1):
> ldiskfs_ext_find_extent: bad header in inode #58262056: invalid magic -
> magic 0, entries 0, max 0(0), depth 0(0)

Did you use the latest e2fsprogrogs from Sun?

-- 
Bernd Schubert
Q-Leap Networks GmbH

Papp Tamás

2008-Sep-16 17:05 UTC

head link

[Lustre-discuss] big problem: read-only fs

Bernd Schubert wrote:> On Tuesday 16 September 2008 18:47:43 Papp Tam?s wrote:
>   
>> hi All,
>>
>> This morning we see on some client, it cannot connect to one of our
node.
>>
>> I run fsck on the node, and remounted it. Fsck found a lot of errors.
>>
>>
>> After this I see this on the logs again:
>>     
>
> [...]
>
>   
>> Sep 16 18:30:08 node1 kernel: LDISKFS-fs error (device sdb1):
>> ldiskfs_ext_find_extent: bad header in inode #58262056: invalid magic -
>> magic 0, entries 0, max 0(0), depth 0(0)
>>     
>
>
> Did you use the latest e2fsprogrogs from Sun?
>
>   No, I use: 1.40.4.cfs1, but anyway it''s a good idea.

However this was the current version, when 1.6.4.3 was the current.

Thanks the advice, I''ll try it.

tamas

Andreas Dilger

2008-Sep-16 22:36 UTC

head link

[Lustre-discuss] big problem: read-only fs

On Sep 16, 2008  18:47 +0200, Papp Tam?s wrote:> I run fsck on the node, and remounted it. Fsck found a lot of errors.
> 
> 
> After this I see this on the logs again:
> Sep 16 18:30:08 node1 kernel: LDISKFS-fs error (device sdb1): 
> ldiskfs_ext_find_extent: bad header in inode #58262056: invalid magic - 
> magic 0, entries 0, max 0(0), depth 0(0)
> Sep 16 18:30:08 node1 kernel: Remounting filesystem read-only
> 
> But why does -30 is here? I hoped, it will disappear after fsck, but I 
> see again. What could cause this problem? How can I solve it?
-30 = -EROFS, caused by the extent header error.  This was fixed in
very recent Lustre e2fsprogs, do you have the latest released version?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

Papp Tamas

2008-Sep-17 14:08 UTC

head link

[Lustre-discuss] big problem: read-only fs

Andreas Dilger wrote:> On Sep 16, 2008  18:47 +0200, Papp Tam?s wrote:
>   
>> I run fsck on the node, and remounted it. Fsck found a lot of errors.
>>
>>
>> After this I see this on the logs again:
>> Sep 16 18:30:08 node1 kernel: LDISKFS-fs error (device sdb1): 
>> ldiskfs_ext_find_extent: bad header in inode #58262056: invalid magic -
>> magic 0, entries 0, max 0(0), depth 0(0)
>> Sep 16 18:30:08 node1 kernel: Remounting filesystem read-only
>>
>> But why does -30 is here? I hoped, it will disappear after fsck, but I 
>> see again. What could cause this problem? How can I solve it?
>>     
>
> -30 = -EROFS, caused by the extent header error.  This was fixed in
> very recent Lustre e2fsprogs, do you have the latest released version?
>
>   
Well, the recent e2fsprogs from Sun did not help.

So I tried to move away the files from the node, but it''s not so
simple,
I have some question.

1.
$ lfs df|grep OST0002
cubefs-OST0002_UUID  1845110624 1512955404 332155220   81% /W[OST:2]
$ lctl dl|grep OST0002
  4 UP osc cubefs-OST0002-osc-ffff81002b2b5000 
345f312a-51e9-b9de-b462-35a56ae76341 5

Which one should I use?

Anyway:

$ lfs find --obd cubefs-OST0002-osc-ffff81002b2b5000 -r .
error: setup_obd_uuids: unknown obduuid: cubefs-OST0002-osc-ffff81002b2b5000
./1 2
./1 23
./1 234
./1 2345

$ lfs find --obd cubefs-OST0002_UUID -r .
error: setup_obd_uuids: unknown obduuid: cubefs-OST0002_UUID
./1 2
./1 23
./1 234
./1 2345

But:

$ lfs getstripe .
OBDS:
. has no stripe info
./1 2
        obdidx           objid          objid            group
             3          455101        0x6f1bd                0

./1 23
        obdidx           objid          objid            group
             3          455125        0x6f1d5                0

./1 234
        obdidx           objid          objid            group
             4          448480        0x6d7e0                0

./1 2345
        obdidx           objid          objid            group
             2          455201        0x6f221                0


2.
# mount|grep lustre
/dev/sdb1 on /mnt/cubefs/ost-1 type lustre (rw)

# grep lustre /proc/mounts
/dev/*sdb */mnt/cubefs/ost-1 lustre ro 0 0

Why don''t I see sdb1 in /proc?
Also why do I see ro in /proc?

3.
samba:~$ cat /proc/fs/lustre/lov/cubefs-clilov-ffff8100330c4800/target_obd
samba:~$

I have an other cluster, on that it shows the right values (on the same 
machine at the same time too).

Lustre 1.6.4.3

Thank you,

tamas

Papp Tamas

2008-Sep-17 17:49 UTC

head link

[Lustre-discuss] big problem: read-only fs

On Wed, Sep 17, 2008 at 04:08:12PM +0200, Papp Tamas
wrote:> 
> Well, the recent e2fsprogs from Sun did not help.
Some more information. It''s strange, but there is one client
(backupgwm it''s a gw client between two cluster, one is the main
cluster and the other one is a backup system.

This one works great without any error. There was another one, but I
umounted it. After I remounted it was working like the others.

Any idea, what''s happening?

Thanks,

tamas

Papp Tamás

2008-Sep-18 17:39 UTC

head link

[Lustre-discuss] big problem: read-only fs

Papp Tamas wrote:> Andreas Dilger wrote:
>   
>> On Sep 16, 2008  18:47 +0200, Papp Tam?s wrote:
>>   
>>     
>>> I run fsck on the node, and remounted it. Fsck found a lot of
errors.
>>>
>>>
>>> After this I see this on the logs again:
>>> Sep 16 18:30:08 node1 kernel: LDISKFS-fs error (device sdb1): 
>>> ldiskfs_ext_find_extent: bad header in inode #58262056: invalid
magic -
>>> magic 0, entries 0, max 0(0), depth 0(0)
>>> Sep 16 18:30:08 node1 kernel: Remounting filesystem read-only
>>>
>>> But why does -30 is here? I hoped, it will disappear after fsck,
but I
>>> see again. What could cause this problem? How can I solve it?
>>>     
>>>       
>> -30 = -EROFS, caused by the extent header error.  This was fixed in
>> very recent Lustre e2fsprogs, do you have the latest released version?
>>
>>   
>>     
>
> Well, the recent e2fsprogs from Sun did not help.
>
> So I tried to move away the files from the node, but it''s not so
simple,
> I have some question.
>
> 1.
> $ lfs df|grep OST0002
> cubefs-OST0002_UUID  1845110624 1512955404 332155220   81% /W[OST:2]
> $ lctl dl|grep OST0002
>   4 UP osc cubefs-OST0002-osc-ffff81002b2b5000 
> 345f312a-51e9-b9de-b462-35a56ae76341 5
>
> Which one should I use?
>
> Anyway:
>
> $ lfs find --obd cubefs-OST0002-osc-ffff81002b2b5000 -r .
> error: setup_obd_uuids: unknown obduuid:
cubefs-OST0002-osc-ffff81002b2b5000
> ./1 2
> ./1 23
> ./1 234
> ./1 2345
>
> $ lfs find --obd cubefs-OST0002_UUID -r .
> error: setup_obd_uuids: unknown obduuid: cubefs-OST0002_UUID
> ./1 2
> ./1 23
> ./1 234
> ./1 2345
>
> But:
>
> $ lfs getstripe .
> OBDS:
> . has no stripe info
> ./1 2
>         obdidx           objid          objid            group
>              3          455101        0x6f1bd                0
>
> ./1 23
>         obdidx           objid          objid            group
>              3          455125        0x6f1d5                0
>
> ./1 234
>         obdidx           objid          objid            group
>              4          448480        0x6d7e0                0
>
> ./1 2345
>         obdidx           objid          objid            group
>              2          455201        0x6f221                0
>
>
> 2.
> # mount|grep lustre
> /dev/sdb1 on /mnt/cubefs/ost-1 type lustre (rw)
>
> # grep lustre /proc/mounts
> /dev/*sdb */mnt/cubefs/ost-1 lustre ro 0 0
>
> Why don''t I see sdb1 in /proc?
> Also why do I see ro in /proc?
>
> 3.
> samba:~$ cat /proc/fs/lustre/lov/cubefs-clilov-ffff8100330c4800/target_obd
> samba:~$
>
> I have an other cluster, on that it shows the right values (on the same 
> machine at the same time too).
>
>   
There is no answer about any of these issues?

Did I do something wrong?

tamas

Brian J. Murrell

2008-Sep-18 18:07 UTC

head link

[Lustre-discuss] big problem: read-only fs

On Thu, 2008-09-18 at 19:39 +0200, Papp Tam?s wrote:> 
> There is no answer about any of these issues?
> 
> Did I do something wrong?
No, but you have to understand that we all have tasks that we must
complete in order to meet our goals and that responding to questions
here is done on an as-time-permits basis.  Some days there is time and
some there is not.  You will just have to bear with us and be patient.

I really don''t want this to come off like a sales pitch, but just so
you
know, if your operations require that you get timely responses to
problems and questions Sun does offer support contracts that guarantee
response times.

Cheers,
b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080918/c769a52c/attachment.bin

Papp Tamás

2008-Sep-18 18:20 UTC

head link

[Lustre-discuss] big problem: read-only fs

Brian J. Murrell wrote:> No, but you have to understand that we all have tasks that we must
> complete in order to meet our goals and that responding to questions
> here is done on an as-time-permits basis.  Some days there is time and
> some there is not.  You will just have to bear with us and be patient.
>
>   I''m really sorry if it looks like I want to hurry anybody for the 
answer. I just get used to get the relatively quick response. I searched 
the web and the archives and I saw a mail with a failure, like this 
without any answer. So I thought the problem is with the questions or 
with the cluster how it was made.> I really don''t want this to come off like a sales pitch, but just
so you
> know, if your operations require that you get timely responses to
> problems and questions Sun does offer support contracts that guarantee
> response times.
>   Actually I tried without feedback:)

Thank you any help,

tamas

Troy Benjegerdes

2008-Sep-18 18:23 UTC

head link

[Lustre-discuss] big problem: read-only fs

> > # grep lustre /proc/mounts
> > /dev/*sdb */mnt/cubefs/ost-1 lustre ro 0 0
> >
> > Why don''t I see sdb1 in /proc?
> > Also why do I see ro in /proc?
> >
> > 3.
> > samba:~$ cat
/proc/fs/lustre/lov/cubefs-clilov-ffff8100330c4800/target_obd
> > samba:~$
> >
> > I have an other cluster, on that it shows the right values (on the
same
> > machine at the same time too).
> >
> >   
> 
> There is no answer about any of these issues?
> 
> Did I do something wrong?
Someone needs to be motivated to answer your questions.. Either because
someone is paying them, or because they get some benefit out of spending
the time.

What''s great about open source is anyone has access to the code, and
you
can exchange something other than money to get help with software. But
there still needs to be something the person answering your question
gets in return for spending the time and effort on it.

Brian J. Murrell

2008-Sep-18 18:40 UTC

head link

[Lustre-discuss] big problem: read-only fs

On Thu, 2008-09-18 at 13:23 -0500, Troy Benjegerdes
wrote:> 
> Someone needs to be motivated to answer your questions.. Either because
> someone is paying them, or because they get some benefit out of spending
> the time.
Indeed.  The benefit for must of us here is purely the satisfaction one
gets from helping somebody else.  Unfortunately that satisfaction
doesn''t go very far when one has to explain why one has not met
one''s
objectives.  :-/

Now, where in this pile did that list of objectives get to...

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080918/2d72c60c/attachment.bin

Brian J. Murrell

2008-Sep-18 18:43 UTC

head link

[Lustre-discuss] big problem: read-only fs

On Thu, 2008-09-18 at 20:20 +0200, Papp Tam?s wrote:>    
> Actually I tried without feedback:)
Ahhhh.  That''s not good.  Did you try
http://www.sun.com/software/products/lustre/support.xml?

b.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080918/c3227051/attachment.bin

Papp Tamás

2008-Sep-18 19:14 UTC

head link

[Lustre-discuss] big problem: read-only fs

Brian J. Murrell wrote:> On Thu, 2008-09-18 at 20:20 +0200, Papp Tam?s wrote:
>   
>>    
>> Actually I tried without feedback:)
>>     
>
> Ahhhh.  That''s not good.  Did you try
> http://www.sun.com/software/products/lustre/support.xml?
>   
If I remember well, yes. I think, the problem is that here in Hungary 
there is no support for Lustre at Sun, probably nobody can do it?
I don''t know.

It was about one or half year ago, right after Sun bought Clusterfs.

Maybe I should try it again:)


tamas

ps.: Again, I didn''t want to arrogate for anything.

Brian J. Murrell

2008-Sep-18 19:20 UTC

head link

[Lustre-discuss] big problem: read-only fs

On Thu, 2008-09-18 at 21:14 +0200, Papp Tam?s wrote:> It was about one or half year ago, right after Sun bought Clusterfs.
> 
> Maybe I should try it again:)
Indeed, please do.  A lot has happened in terms of integration into Sun
in the last year.
> ps.: Again, I didn''t want to arrogate for anything.
Understood.  No worries.  I just wanted to explain why you might not be
seeing any responses and that it was nothing you did wrong.

b.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url :
http://lists.lustre.org/pipermail/lustre-discuss/attachments/20080918/c048878a/attachment.bin

Ms. Megan Larko

2008-Sep-18 19:27 UTC

head link

[Lustre-discuss] big problem: read-only fs

WRT paid Lustre support, I cannot speak with regards to the country of
Hungary, but even within the U.S.  Lustre support from Sun is
difficult to get.

I, on behalf of my company, contacted Sun for Lustre support (one
person was Mr. Mike McClain).
I received a price quote.   I responded that my company would agree to
pay that U.S. dollar amount and would it be possible to see what the
dollar amount actually covered in terms of service support.

That was two months ago.   I have never heard anything further from Sun system.

....but the List does tend to answer....   ;-)

Megan Larko
Center for Research on Environment & Water (CREW)
Beltsville, MD
U.S.A.

Lustre discuss - Sep 2008 - big problem: read-only fs

[Lustre-discuss] big problem: read-only fs

[Lustre-discuss] big problem: read-only fs

[Lustre-discuss] big problem: read-only fs

[Lustre-discuss] big problem: read-only fs

[Lustre-discuss] big problem: read-only fs

[Lustre-discuss] big problem: read-only fs

[Lustre-discuss] big problem: read-only fs

[Lustre-discuss] big problem: read-only fs

[Lustre-discuss] big problem: read-only fs

[Lustre-discuss] big problem: read-only fs

[Lustre-discuss] big problem: read-only fs

[Lustre-discuss] big problem: read-only fs

[Lustre-discuss] big problem: read-only fs

[Lustre-discuss] big problem: read-only fs

[Lustre-discuss] big problem: read-only fs