thr3ads.net - Ocfs2 devel - [Ocfs2-devel] Ocfs2 leaking inodes on failed allocation [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Jan Kara

2010-Apr-20 18:00 UTC

[Ocfs2-devel] Ocfs2 leaking inodes on failed allocation

Hi,

  when running fsstress test on an almost full filesystem we observe
the following errors:
1163.522931] (4774,1):ocfs2_query_inode_wipe:898 ERROR: bug expression:
!(di->i_flags & cpu_to_le32(OCFS2_ORPHANED_FL))
[ 1163.522938] (4774,1):ocfs2_query_inode_wipe:898 ERROR: Inode 77233
(on-disk 77233) not orphaned! Disk flags  0x1, inode flags 0x0

This is caused by the fact that we succeed in allocating inode in
ocfs2_mknod_locked but later fail to allocate block for symlink data
or directory data because of ENOSPC. So we set i_nlink to 0 and 
by doing iput() we continue through standard inode deletion path but the
inode is not orphaned and thus the error check is triggered.

Now this isn't trivial to fix (at least AFAICS) so I wanted to share my
thoughts before investing too much time in writing the patch which would
be then rejected.

The easiest solution would be to always create inodes in the orphan
directory (we even have a function ocfs2_create_inode_in_orphan for this).
The downside this has would be that I expect we would start contending on
orphan dir i_mutex quite early and thus fs scalability would suffer a lot.
Also there's some additional IO and CPU cost involved...

Adding inode to orphan dir after we find out we cannot finish allocation
is IMHO no-go. Because the filesystem is close to ENOSPC, we even don't
have to have a block to extend orphan directory to accomodate new directory
entry. Also adding to orphan directory has to happen outside of a
transaction (due to lock ordering) but we have a transaction already
started and cannot stop it without adding a link to the inode somewhere
(otherwise we would leak the inode in case of crash).

The last idea I have is that we could "undo" the inode allocation and
other operations we did in the transaction so far. But looking at the code
it would get nasty quickly - all the xattr handling which gets inode locks,
starts & stops transactions, etc...

Any other ideas? What would make things much easier would be if orphan
handling was more lightweight like it is e.g. in ext3 / ext4 - there we
have just linked list of orphaned inodes and so if we decide an inode needs
to be orphaned, we just have to modify the superblock (orphan list head)
and the inode (to point at the current orphan list head)... In OCFS2 we
could have a per-slot lists like this but a change like this would probably
be an overkill for the above bug so it would make sence only if there would
be other benefits from this.

								Honza
-- 
Jan Kara <jack at suse.cz>
SUSE Labs, CR

Joel Becker

2010-Apr-20 19:04 UTC

head link

[Ocfs2-devel] Ocfs2 leaking inodes on failed allocation

On Tue, Apr 20, 2010 at 08:00:54PM +0200, Jan Kara
wrote:> the following errors:
> 1163.522931] (4774,1):ocfs2_query_inode_wipe:898 ERROR: bug expression:
> !(di->i_flags & cpu_to_le32(OCFS2_ORPHANED_FL))
> [ 1163.522938] (4774,1):ocfs2_query_inode_wipe:898 ERROR: Inode 77233
> (on-disk 77233) not orphaned! Disk flags  0x1, inode flags 0x0
	See the thread starting "Subject: [Ocfs2-devel] [PATCH] ocfs2:
alloc orphaned inode in ocfs2_symlink".  I'll sum up here.
> The easiest solution would be to always create inodes in the orphan
> directory (we even have a function ocfs2_create_inode_in_orphan for this).
> The downside this has would be that I expect we would start contending on
> orphan dir i_mutex quite early and thus fs scalability would suffer a lot.
> Also there's some additional IO and CPU cost involved...
	Yeah, this is a non-starter.
> The last idea I have is that we could "undo" the inode allocation
and
> other operations we did in the transaction so far. But looking at the code
> it would get nasty quickly - all the xattr handling which gets inode locks,
> starts & stops transactions, etc...
	This is the "best" solution, but it requires some thought and
care.  We'd love to get here someday.
> Any other ideas? What would make things much easier would be if orphan
> handling was more lightweight like it is e.g. in ext3 / ext4 - there we
> have just linked list of orphaned inodes and so if we decide an inode needs
> to be orphaned, we just have to modify the superblock (orphan list head)
> and the inode (to point at the current orphan list head)... In OCFS2 we
> could have a per-slot lists like this but a change like this would probably
> be an overkill for the above bug so it would make sence only if there would
> be other benefits from this.
	We're not going to change our orphan storage, either.  This
still needs locking in the cluster, and that's just a pain.
	Near the end of the referenced thread, Mark told Dong Yang to
implement the OCFS2_INODE_SKIP_ORPHAN_DIR flag.  This flag merely lets
delete_inode know that we never orphaned the sucker.  delete_inode can
do the rest of its work without triggering the above warning.

Joel

-- 

Life's Little Instruction Book #237

	"Seek out the good in people."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

Mark Fasheh

2010-Apr-20 19:18 UTC

head link

[Ocfs2-devel] Ocfs2 leaking inodes on failed allocation

On Tue, Apr 20, 2010 at 08:00:54PM +0200, Jan Kara
wrote:>   Hi,
> 
>   when running fsstress test on an almost full filesystem we observe
> the following errors:
> 1163.522931] (4774,1):ocfs2_query_inode_wipe:898 ERROR: bug expression:
> !(di->i_flags & cpu_to_le32(OCFS2_ORPHANED_FL))
> [ 1163.522938] (4774,1):ocfs2_query_inode_wipe:898 ERROR: Inode 77233
> (on-disk 77233) not orphaned! Disk flags  0x1, inode flags 0x0
> 
> This is caused by the fact that we succeed in allocating inode in
> ocfs2_mknod_locked but later fail to allocate block for symlink data
> or directory data because of ENOSPC. So we set i_nlink to 0 and 
> by doing iput() we continue through standard inode deletion path but the
> inode is not orphaned and thus the error check is triggered.
> 
> Now this isn't trivial to fix (at least AFAICS) so I wanted to share my
> thoughts before investing too much time in writing the patch which would
> be then rejected.
> 
> The easiest solution would be to always create inodes in the orphan
> directory (we even have a function ocfs2_create_inode_in_orphan for this).
> The downside this has would be that I expect we would start contending on
> orphan dir i_mutex quite early and thus fs scalability would suffer a lot.
> Also there's some additional IO and CPU cost involved...
> 
> Adding inode to orphan dir after we find out we cannot finish allocation
> is IMHO no-go. Because the filesystem is close to ENOSPC, we even don't
> have to have a block to extend orphan directory to accomodate new directory
> entry. Also adding to orphan directory has to happen outside of a
> transaction (due to lock ordering) but we have a transaction already
> started and cannot stop it without adding a link to the inode somewhere
> (otherwise we would leak the inode in case of crash).
> 
> The last idea I have is that we could "undo" the inode allocation
and
> other operations we did in the transaction so far. But looking at the code
> it would get nasty quickly - all the xattr handling which gets inode locks,
> starts & stops transactions, etc...
> 
> Any other ideas? What would make things much easier would be if orphan
> handling was more lightweight like it is e.g. in ext3 / ext4 - there we
> have just linked list of orphaned inodes and so if we decide an inode needs
> to be orphaned, we just have to modify the superblock (orphan list head)
> and the inode (to point at the current orphan list head)... In OCFS2 we
> could have a per-slot lists like this but a change like this would probably
> be an overkill for the above bug so it would make sence only if there would
> be other benefits from this.
Flag the failed inodes in memory (see ocfs2_inode_info->ip_flags) and
special case it in ocfs2_delete_inode(). That's the easiest way to go about
this without impacting performance. Then our window for leaking blocks is
only if the box takes a reboot (or crash) right before we hit
delete_inode(). That's an acceptable risk (we're talking a very small
number
of blocks and a very small window of time).
	--Mark

--
Mark Fasheh

Ocfs2 devel - Apr 2010 - Ocfs2 leaking inodes on failed allocation

[Ocfs2-devel] Ocfs2 leaking inodes on failed allocation

[Ocfs2-devel] Ocfs2 leaking inodes on failed allocation

[Ocfs2-devel] Ocfs2 leaking inodes on failed allocation