thr3ads.net - Lustre discuss - [Lustre-discuss] Clearing orphaned inode 26502799 [Jul 2007]

If this information is useful, please help other people find it:
Share via:

Bernd Schubert

2007-Jul-05 07:34 UTC

[Lustre-discuss] Clearing orphaned inode 26502799

Hi,

on the mgs node e2fsck-1.39.cfs8 shows this:

e2fsck 1.39.cfs8 (7-Apr-2007)
lustre-MDT0000: recovering journal
Clearing orphaned inode 26502799 (uid=1012, gid=100, mode=0100644, size=0)
Clearing orphaned inode 26501309 (uid=1012, gid=100, mode=0100644, size=0)
Clearing orphaned inode 26500161 (uid=1012, gid=100, mode=0100644, size=0)
Clearing orphaned inode 26499062 (uid=1012, gid=100, mode=0100644, size=0)
Clearing orphaned inode 26497822 (uid=1012, gid=100, mode=0100644, size=0)
Clearing orphaned inode 26497454 (uid=1012, gid=100, mode=0100644, size=0)
Clearing orphaned inode 26470885 (uid=1012, gid=100, mode=0100644, size=0)
Clearing orphaned inode 26470857 (uid=1012, gid=100, mode=0100644, size=0)
Clearing orphaned inode 26470840 (uid=1012, gid=100, mode=0100644, size=0)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information


This is after building a bintuils packages on lustre clients and then checking 
the filesystem.

Any idea how critical it is? 


Thanks,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH

Andreas Dilger

2007-Jul-05 10:59 UTC

head link

[Lustre-discuss] Clearing orphaned inode 26502799

On Jul 05, 2007  15:34 +0200, Bernd Schubert wrote:> on the mgs node e2fsck-1.39.cfs8 shows this:
> 
> e2fsck 1.39.cfs8 (7-Apr-2007)
> lustre-MDT0000: recovering journal
> Clearing orphaned inode 26502799 (uid=1012, gid=100, mode=0100644, size=0)
> Clearing orphaned inode 26501309 (uid=1012, gid=100, mode=0100644, size=0)
> Clearing orphaned inode 26500161 (uid=1012, gid=100, mode=0100644, size=0)
> Clearing orphaned inode 26499062 (uid=1012, gid=100, mode=0100644, size=0)
> Clearing orphaned inode 26497822 (uid=1012, gid=100, mode=0100644, size=0)
> Clearing orphaned inode 26497454 (uid=1012, gid=100, mode=0100644, size=0)
> Clearing orphaned inode 26470885 (uid=1012, gid=100, mode=0100644, size=0)
> Clearing orphaned inode 26470857 (uid=1012, gid=100, mode=0100644, size=0)
> Clearing orphaned inode 26470840 (uid=1012, gid=100, mode=0100644, size=0)
> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> 
> 
> This is after building a bintuils packages on lustre clients and then
checking
> the filesystem.
> 
> Any idea how critical it is? 
This is normal even for local ext3 filesystems, if they crash with open but
unlinked files.  If the filesystem has been stopped normally (not forced)
then this shouldn''t be happening.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Bernd Schubert

2007-Jul-05 12:20 UTC

head link

[Lustre-discuss] Re: Clearing orphaned inode 26502799

Andreas Dilger wrote:
>> 
>> This is after building a bintuils packages on lustre clients and then
>> checking the filesystem.
>> 
>> Any idea how critical it is?
> 
> This is normal even for local ext3 filesystems, if they crash with open
> but
> unlinked files.  If the filesystem has been stopped normally (not forced)
> then this shouldn''t be happening.
Sorry, I should have been more specific. This happens reproducibly on a
normally unmounted MDS node.

As e2fsck detects it on journal replay, does this mean the buffers are
probably not flushed to disk before umount succeeds?


Thanks a lot for your help,
Bernd

Andreas Dilger

2007-Jul-05 14:25 UTC

head link

[Lustre-discuss] Re: Clearing orphaned inode 26502799

On Jul 05, 2007  20:19 +0200, Bernd Schubert wrote:> Andreas Dilger wrote:
> >> This is after building a bintuils packages on lustre clients and
then
> >> checking the filesystem.
> >> 
> >> Any idea how critical it is?
> > 
> > This is normal even for local ext3 filesystems, if they crash with
open
> > but
> > unlinked files.  If the filesystem has been stopped normally (not
forced)
> > then this shouldn''t be happening.
> 
> Sorry, I should have been more specific. This happens reproducibly on a
> normally unmounted MDS node.
Is this also after the clients are properly unmounted and/or evicted
(not using --force during MDS cleanup)?  In that case it seems to show
that some files are leaking inode reference counts or similar.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

Bernd Schubert

2007-Jul-05 15:07 UTC

head link

[Lustre-discuss] Re: Re: Clearing orphaned inode 26502799

Andreas Dilger wrote:
> On Jul 05, 2007  20:19 +0200, Bernd Schubert wrote:
>> Andreas Dilger wrote:
>> >> This is after building a bintuils packages on lustre clients
and then
>> >> checking the filesystem.
>> >> 
>> >> Any idea how critical it is?
>> > 
>> > This is normal even for local ext3 filesystems, if they crash with
open
>> > but
>> > unlinked files.  If the filesystem has been stopped normally (not
>> > forced) then this shouldn''t be happening.
>> 
>> Sorry, I should have been more specific. This happens reproducibly on a
>> normally unmounted MDS node.
> 
> Is this also after the clients are properly unmounted and/or evicted
> (not using --force during MDS cleanup)?  In that case it seems to show
> that some files are leaking inode reference counts or similar.
I need test if it also happens when the clients umount first, but I
didn''t
give the --force option on MDS umount.

Any idea how I can debug it?


Thanks,
Bernd

Bernd Schubert

2007-Jul-06 04:03 UTC

head link

[Lustre-discuss] Re: Re: Clearing orphaned inode 26502799

I just got this in dmesg:

[67100.343686] Lustre: Found inode with zero generation or link -- this may 
indicate disk corruption (inode: 26398122/151277060, link 0, count 1)
[67100.358660] Lustre: Found inode with zero generation or link -- this may 
indicate disk corruption (inode: 26398122/151277060, link 0, count 1)
[67100.368242] Lustre: Found inode with zero generation or link -- this may 
indicate disk corruption (inode: 26398122/151277060, link 0, count 1)
[67100.368248] Lustre: Skipped 6 previous similar messages
[67100.386517] Lustre: Found inode with zero generation or link -- this may 
indicate disk corruption (inode: 26398114/151277054, link 0, count 1)

So I did run an e2fsck again and among the orphaned inodes e2fsck found in the 
journal were also these above.

This might or might not be a bug in our additional patches for 2.6.20, but its 
hard to test, since I have not access to your cvs tree for lustre-1.6.1 with 
2.6.18 support and older kernels won''t run on our hardware.

Let me know if there''s anything I can do to debug this.

Cheers,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH

Kalpak Shah

2007-Jul-06 04:27 UTC

head link

[Lustre-discuss] Re: Re: Clearing orphaned inode 26502799

On Fri, 2007-07-06 at 12:03 +0200, Bernd Schubert wrote:> I just got this in dmesg:
> 
> [67100.343686] Lustre: Found inode with zero generation or link -- this may
> indicate disk corruption (inode: 26398122/151277060, link 0, count 1)
> [67100.358660] Lustre: Found inode with zero generation or link -- this may
> indicate disk corruption (inode: 26398122/151277060, link 0, count 1)
> [67100.368242] Lustre: Found inode with zero generation or link -- this may
> indicate disk corruption (inode: 26398122/151277060, link 0, count 1)
> [67100.368248] Lustre: Skipped 6 previous similar messages
> [67100.386517] Lustre: Found inode with zero generation or link -- this may
> indicate disk corruption (inode: 26398114/151277054, link 0, count 1)
> 
> So I did run an e2fsck again and among the orphaned inodes e2fsck found in
the
> journal were also these above.
> 
> This might or might not be a bug in our additional patches for 2.6.20, but
its
> hard to test, since I have not access to your cvs tree for lustre-1.6.1
with
> 2.6.18 support and older kernels won''t run on our hardware.
> 
> Let me know if there''s anything I can do to debug this.
Hi Bernd,

This happens because it is actually possible to get a zero generation inode once
every (random < 2^32) inodes.

Ldiskfs needs to have this patch to skip inodes with generation = 0.

linux-2.6.9-34.orig/fs/ext3/ialloc.c	2007-01-03 13:30:33.000000000 +0000
+++ linux-2.6.9-34/fs/ext3/ialloc.c	2007-01-03 13:42:04.000000000 +0000
@@ -721,6 +721,8 @@ got:
 	insert_inode_hash(inode);
 	spin_lock(&sbi->s_next_gen_lock);
 	inode->i_generation = sbi->s_next_generation++;
+	if (unlikely(inode->i_generation == 0))
+		inode->i_generation = sbi->s_next_generation++;
 	spin_unlock(&sbi->s_next_gen_lock);
 
 	ei->i_state = EXT3_STATE_NEW;

There also needs to be a change in mds_fid2dentry(). These patches can be found
in bz10419.

Thanks,
Kalpak.
> 
> Cheers,
> Bernd
> 
>

Bernd Schubert

2007-Jul-06 13:25 UTC

head link

[Lustre-discuss] Re: Re: Clearing orphaned inode 26502799

Hello Kalpak,

On Friday 06 July 2007 12:28:29 Kalpak Shah wrote:>
> Hi Bernd,
>
> This happens because it is actually possible to get a zero generation inode
> once every (random < 2^32) inodes.
>
> Ldiskfs needs to have this patch to skip inodes with generation = 0.
>
> linux-2.6.9-34.orig/fs/ext3/ialloc.c	2007-01-03 13:30:33.000000000 +0000
> +++ linux-2.6.9-34/fs/ext3/ialloc.c	2007-01-03 13:42:04.000000000 +0000
> @@ -721,6 +721,8 @@ got:
>  	insert_inode_hash(inode);
>  	spin_lock(&sbi->s_next_gen_lock);
>  	inode->i_generation = sbi->s_next_generation++;
> +	if (unlikely(inode->i_generation == 0))
> +		inode->i_generation = sbi->s_next_generation++;
>  	spin_unlock(&sbi->s_next_gen_lock);
>
>  	ei->i_state = EXT3_STATE_NEW;
>
> There also needs to be a change in mds_fid2dentry(). These patches can be
> found in bz10419.
thanks a lot, I will test it as soon as possible, today I was all the day busy 
with an entirely diferent issue (not related to lustre at all).


Thanks again,
Bernd

-- 
Bernd Schubert
Q-Leap Networks GmbH

Lustre discuss - Jul 2007 - Clearing orphaned inode 26502799

[Lustre-discuss] Clearing orphaned inode 26502799

[Lustre-discuss] Clearing orphaned inode 26502799

[Lustre-discuss] Re: Clearing orphaned inode 26502799

[Lustre-discuss] Re: Clearing orphaned inode 26502799

[Lustre-discuss] Re: Re: Clearing orphaned inode 26502799

[Lustre-discuss] Re: Re: Clearing orphaned inode 26502799

[Lustre-discuss] Re: Re: Clearing orphaned inode 26502799

[Lustre-discuss] Re: Re: Clearing orphaned inode 26502799