thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [RFC] ocfs2: Remove j_trans

If this information is useful, please help other people find it:
Share via:

Tao Ma

2010-Oct-20 06:08 UTC

[Ocfs2-devel] [RFC] ocfs2: Remove j_trans_barrier

Hi all,
	j_trans_barrier in ocfs2 is used to protect some journal operations in 
ocfs2. So normally, it is used as belows:
1. In journal transaction. When we start a transaction, We will 
down_read it and j_num_trans will be increased accordingly(in case of a 
cluster environment). It will be up_read when we do ocfs2_commit_trans.
2. In ocfs2_commit_cache, we will down_write it and then call 
jbd2_journal_flush, increase j_trans_id, reset j_num_trans and finally 
call up_write. This function is used by thread ocfs2cmt.

So in general, when we do journal flush, no new transaction will be 
started because of it. But it did hold off the system and caused a long 
delay for some file operations. I have met with a bug.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1281

After 30 days of usage of ocfs2, the system becomes slower and 
slower(why the journal commit becomes so slower is still unknown and may 
be related to file system fragmentation) and a tiny open/truncate of a 
file will cause around 10-30 secs. I don't think it is endurable for a 
user. After putting some debug codes in the kernel(great thanks to the 
user), I find that it is the blocked by ocfs2_start_trans. The strace 
log shows:

22955 open("/usr/home/test_io_file_ow", O_WRONLY|O_CREAT|O_TRUNC,
0666)
= 1 <10.329676>

And from the system log:

Sep 24 17:28:30 192.168.0.4 kernel: 
(dd,22955,5):ocfs2_orphan_for_truncate:354 start transcation for inode 
105572512
Sep 24 17:28:41 192.168.0.4 kernel: 
(dd,22955,5):ocfs2_orphan_for_truncate:362 journal access for inode 
105572512

The code is like this:
	mlog(0, "start transcation for inode %llu\n",
OCFS2_I(inode)->ip_blkno);
         handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS);
         if (IS_ERR(handle)) {
                 status = PTR_ERR(handle);
                 mlog_errno(status);
                 goto out;
         }

         mlog(0, "journal access for inode %llu\n", 
OCFS2_I(inode)->ip_blkno);

So we spent 11 secs in ocfs2_start_trans!

 From what I have investigated, j_trans_barrier is only used in a 
cluster environment(for a local mounted volume, ocfs2cmt isn't started 
and we depends on jbd2 to flush the journal). And it works with 
j_trans_id to make sure all the modifications to the specified 
ocfs2_caching_info are flushed(see ocfs2_ci_fully_checkpointed) when we 
downconvert a cluster lock. And we also call ocfs2_set_ci_lock_trans in 
journal_access so that we know the last trans_id for a specified 
ocfs2_caching_info.

My solution is that:
1. remove j_trans_barrier
2. Add a flag ci_checkpointing in ocfs2_caching_info:
    1) When we find this caching_info needs checkpoint, set this flag 
and start the checkpointing(in ocfs2_ci_checkpointed). And the 
downconvert request will be requeued so that we can check and clear this 
flag next time it is handled.
    2) Clear the flag when there is no need for checkpointing this 
ci(also in ocfs2_ci_checkpointed) during check_downconvert.
3. make sure when we journal_access some blocks, the caching_info can't 
be in the state of checkpointing. I think if we are checkpointing an 
caching_info, we shouldn't be able to journal_access it since it is just 
required to downconvert and we shouldn't have the lock now? So perhaps a 
BUG_ON should work?

So above is the scenario and my solution. Any comments are welcomed.

Regards,
Tao

Joel Becker

2010-Nov-25 10:08 UTC

head link

[Ocfs2-devel] [RFC] ocfs2: Remove j_trans_barrier

On Wed, Oct 20, 2010 at 02:08:17PM +0800, Tao Ma wrote:> 	j_trans_barrier in ocfs2 is used to protect some journal operations
> in ocfs2. So normally, it is used as belows:
> 1. In journal transaction. When we start a transaction, We will
> down_read it and j_num_trans will be increased accordingly(in case
> of a cluster environment). It will be up_read when we do
> ocfs2_commit_trans.
> 2. In ocfs2_commit_cache, we will down_write it and then call
> jbd2_journal_flush, increase j_trans_id, reset j_num_trans and
> finally call up_write. This function is used by thread ocfs2cmt.
<snip> slow filesystem... </snip>
> My solution is that:
> 1. remove j_trans_barrier
> 2. Add a flag ci_checkpointing in ocfs2_caching_info:
>    1) When we find this caching_info needs checkpoint, set this flag
> and start the checkpointing(in ocfs2_ci_checkpointed). And the
> downconvert request will be requeued so that we can check and clear
> this flag next time it is handled.
>    2) Clear the flag when there is no need for checkpointing this
> ci(also in ocfs2_ci_checkpointed) during check_downconvert.
> 3. make sure when we journal_access some blocks, the caching_info
> can't be in the state of checkpointing. I think if we are
> checkpointing an caching_info, we shouldn't be able to
> journal_access it since it is just required to downconvert and we
> shouldn't have the lock now? So perhaps a BUG_ON should work?
Tao,
	I'm sorry I haven't responded sooner.  This proposal didn't
strike me as quite right, and I didn't have time to think about it.
I have a couple of concerns.
	First, we don't always checkpoint from a downconvert.  We do it
in clear_inode() as well, when we are flushing an inode from cache.
This may not have anything to do with the lock we're caring about, eg on
other inodes.  What I mean is, the caching info for the inode we care
about may not be checkpointing, but the journal as a whole is.  We need
to stop all action while that is happening.
	Second, there is the flip side.  How do we wait until all open
transactions are complete before checkpointing?  The down_write() in
ocfs2_commit_cache() blocks until all open transactions up_read().  In
your scheme, there is no care taken for open transactions against the
journal.  Remember, the journal is global to the node.

Joel

-- 

Life's Little Instruction Book #464

	"Don't miss the magic of the moment by focusing on what's
	 to come."

Joel Becker
Senior Development Manager
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

Joel Becker

2010-Nov-25 10:19 UTC

head link

[Ocfs2-devel] [RFC] ocfs2: Remove j_trans_barrier

On Thu, Nov 25, 2010 at 02:08:22AM -0800, Joel Becker
wrote:> 	Second, there is the flip side.  How do we wait until all open
> transactions are complete before checkpointing?  The down_write() in
> ocfs2_commit_cache() blocks until all open transactions up_read().  In
> your scheme, there is no care taken for open transactions against the
> journal.  Remember, the journal is global to the node.
	Hmm.  I wonder if we can allow transactions as soon as we kick
off the journal?  Basically, right now, we do the following:

1) down_write(trans_barrier)
  - Wait for all open transactions
  - Block all new transactions
2) jbd2_journal_flush()
  - Write out the journal
  - Wait on the journal flush
3) up_write(trans_barrier)
  - Unblock new transactions

	We absolutely need to wait for open transactions before starting
the flush.  Otherwise, we may not have the transaction we need for a
downconvert closed.  But do we need to block new transactions once the
journal flush is going?  Like, we could up_write() our transaction
barrier after calling journal_lock_updates().  Would that work?  Would
it help?

Joel

-- 

"The cynics are right nine times out of ten."  
        - H. L. Mencken

Joel Becker
Senior Development Manager
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

Ocfs2 devel - Oct 2010 - [RFC] ocfs2: Remove j_trans_barrier

[Ocfs2-devel] [RFC] ocfs2: Remove j_trans_barrier

[Ocfs2-devel] [RFC] ocfs2: Remove j_trans_barrier

[Ocfs2-devel] [RFC] ocfs2: Remove j_trans_barrier