thr3ads.net - Ocfs2 devel - [Ocfs2-devel] Bug in inode deletion code leading to stale inodes [Jan 2009]

If this information is useful, please help other people find it:
Share via:

Jan Kara

2009-Jan-12 22:06 UTC

[Ocfs2-devel] Bug in inode deletion code leading to stale inodes

Hello,

  I've hit a bug in OCFS2 delete code which results in inodes being left on
disk without any links to them. The workload triggering this creates
directories on one node and deletes them on another node in the cluster.
The inode is not deleted because both nodes bail out from
ocfs2_delete_inode() with:
Skipping delete of 100405 because it is in use on other nodes

  The scenario which I think is happening is as follows:

  node1					node2
					rmdir("d");
					  ocfs2_remote_dentry_delete()
  ocfs2_dentry_convert_worker()
					  finishes ocfs2_unlink()
					  eventually enters ocfs2_delete_inode()
					    ocfs2_inode_lock()
					    ocfs2_query_inode_wipe() -> fail
					    ocfs2_inode_unlock()
  ocfs2_dentry_post_unlock()
    ocfs2_drop_dentry_lock()
      iput()
       ocfs2_delete_inode()
         ocfs2_inode_lock()
	 ocfs2_query_inode_wipe() -> fail
	 ocfs2_inode_unlock()
         clear_inode()
					    clear_inode()

  The question is how to avoid this. It seems to me that we have to really
do open_lock() and not just trylock to avoid the race. Is there any reason
why we cannot move the open_lock() before inode_lock() in
ocfs2_delete_inode()?

									Honza

-- 
Jan Kara <jack at suse.cz>
SUSE Labs, CR

Mark Fasheh

2009-Jan-15 01:59 UTC

head link

[Ocfs2-devel] Bug in inode deletion code leading to stale inodes

On Mon, Jan 12, 2009 at 11:06:35PM +0100, Jan Kara
wrote:>   I've hit a bug in OCFS2 delete code which results in inodes being
left on
> disk without any links to them. The workload triggering this creates
> directories on one node and deletes them on another node in the cluster.
> The inode is not deleted because both nodes bail out from
> ocfs2_delete_inode() with:
> Skipping delete of 100405 because it is in use on other nodes
> 
>   The scenario which I think is happening is as follows:
> 
>   node1					node2
> 					rmdir("d");
> 					  ocfs2_remote_dentry_delete()
>   ocfs2_dentry_convert_worker()
> 					  finishes ocfs2_unlink()
> 					  eventually enters ocfs2_delete_inode()
> 					    ocfs2_inode_lock()
> 					    ocfs2_query_inode_wipe() -> fail
> 					    ocfs2_inode_unlock()
>   ocfs2_dentry_post_unlock()
>     ocfs2_drop_dentry_lock()
>       iput()
>        ocfs2_delete_inode()
>          ocfs2_inode_lock()
> 	 ocfs2_query_inode_wipe() -> fail
> 	 ocfs2_inode_unlock()
>          clear_inode()
> 					    clear_inode()
FWIW, your analysis looks correct to me.

>   The question is how to avoid this. It seems to me that we have to really
> do open_lock() and not just trylock to avoid the race. Is there any reason
> why we cannot move the open_lock() before inode_lock() in
> ocfs2_delete_inode()?
We can't do that because open_lock() will deadlock with nodes still using
the inode.

The way the open lock works is that each node takes a RO on it when the
inode is instantiated. That PR is dropped in ocfs2_clear_inode(). If a node
wants to delete an inode, it will trylock at EX to see if any other nodes
have an active lock.

I think we have to look at how the downconvert thread deals with these
locks. Either we push some parts to a work queue, or we figure out how to
reorder some of the operations.

The problem with pushing off to a work queue, is we run the risk of slowing
down unlink related ops. Pushing the dentry downconvert away will result in
high unlink() latency, while pushing the final iput() away runs the risk of
bouncing the orphan dir lock (if the iput() comes after the deleting nodes
query in ->delete_inode())

Do you think there's a way we could just reverse the order - iput the inode,
BEFORE downconverting the dentry lock?
	--Mark

--
Mark Fasheh

Maybe Matching Threads

Search for more seemingly similar threads

Ocfs2 devel - Jan 2009 - Bug in inode deletion code leading to stale inodes

[Ocfs2-devel] Bug in inode deletion code leading to stale inodes

[Ocfs2-devel] Bug in inode deletion code leading to stale inodes

Maybe Matching Threads