thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [PATCH 0/3] ocfs2: Resolve the problem of truncate log flush. [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Tao Ma

2010-Sep-16 06:19 UTC

[Ocfs2-devel] [PATCH 0/3] ocfs2: Resolve the problem of truncate log flush.

Hi all,
	Recently, one of our colleagues meet with a problem that if we 
write/delete a 32mb files repeatly, we will get a ENOSPC in the end. And 
the corresponding bug is 1288.
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1288

So this patch set just tries to resolve it. It includes 3 patches:
0001 is a separate one. It just adds a new watermark for truncate log, 
FLUSH_TRUNCATE_LOG_RATIO. So if the truncate log has collected too much 
clusters, ocfs2_truncate_log_needs_flush will tell the caller to flush 
immediately.

0002-0003 are a little complicate here.
So in general, even we have freed the clusters to the global bitmap(by 
the above flush), we can't use them because they haven't been committed 
to the disk(check the comments above ocfs2_test_bg_bit_allocatable for 
details). So we have to tell jbd2 to flush immediately.
0002 reverts some change in commit c271c5c so that now even in local 
mode we can force jbd2 to flush immediately.
0003 add the 'checkpoint' to ocfs2_truncate_log_needs_flush, so that the
callers know that we need to checkpoint after the flush.

Actually 0003 can work without 0002 since in cluster env, 
ocfs2_start_checkpoint will work and 0002 is just used for a local mode. 
So feel free to drop 0002 if we decide to still keep ocfs2cmt out of 
local mode.

Regards,
Tao

Tao Ma

2010-Sep-16 06:21 UTC

head link

[Ocfs2-devel] [PATCH 1/3] ocfs2: flush truncate log in case it contains too many clusters.

When we test whether we need to flush truncate log in
ocfs2_truncate_log_needs_flush, we only take care of
whether the truncate log is full. But if the volume is
small and  we have large block size(in this case truncate
log can store too many records), we may be too late
for flushing if the user create/write/delete files quickly.

So I add a new FLUSH_TRUNCATE_LOG_RATIO so that we will also
check whether the number of truncated clusters has reached
a watermark, if yes, flush the truncate log.
It resolves the ossbug #1288 somehow.

Signed-off-by: Tao Ma <tao.ma at oracle.com>
---
 fs/ocfs2/alloc.c |   25 ++++++++++++++++++++++++-
 fs/ocfs2/ocfs2.h |    2 ++
 2 files changed, 26 insertions(+), 1 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index 592fae5..c765447 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -5751,11 +5751,13 @@ out:
 	return ret;
 }
 
+#define FLUSH_TRUNCATE_LOG_RATIO	10
 int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb)
 {
 	struct buffer_head *tl_bh = osb->osb_tl_bh;
 	struct ocfs2_dinode *di;
 	struct ocfs2_truncate_log *tl;
+	int flush = 0;
 
 	di = (struct ocfs2_dinode *) tl_bh->b_data;
 	tl = &di->id2.i_dealloc;
@@ -5764,7 +5766,25 @@ int ocfs2_truncate_log_needs_flush(struct ocfs2_super
*osb)
 			"slot %d, invalid truncate log parameters: used = "
 			"%u, count = %u\n", osb->slot_num,
 			le16_to_cpu(tl->tl_used), le16_to_cpu(tl->tl_count));
-	return le16_to_cpu(tl->tl_used) == le16_to_cpu(tl->tl_count);
+	if (le16_to_cpu(tl->tl_used) == le16_to_cpu(tl->tl_count))
+		flush = 1;
+	else {
+		/*
+		 * Check whether we have reserved enough clusters
+		 * to flush.
+		 */
+		u32 watermark = osb->osb_clusters_at_boot / 100 *
+				FLUSH_TRUNCATE_LOG_RATIO;
+
+		if (watermark < osb->truncated_clusters) {
+			mlog(0, "flush truncate log: watermark %u,"
+			     " we have %u clusters truncated\n",
+			     watermark, osb->truncated_clusters);
+			flush = 1;
+		}
+	}
+
+	return flush;
 }
 
 static int ocfs2_truncate_log_can_coalesce(struct ocfs2_truncate_log *tl,
@@ -5858,6 +5878,7 @@ int ocfs2_truncate_log_append(struct ocfs2_super *osb,
 
 	ocfs2_journal_dirty(handle, tl_bh);
 
+	osb->truncated_clusters += num_clusters;
 bail:
 	mlog_exit(status);
 	return status;
@@ -5929,6 +5950,8 @@ static int ocfs2_replay_truncate_records(struct
ocfs2_super *osb,
 		i--;
 	}
 
+	osb->truncated_clusters = 0;
+
 bail:
 	mlog_exit(status);
 	return status;
diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h
index c67003b..5f47883 100644
--- a/fs/ocfs2/ocfs2.h
+++ b/fs/ocfs2/ocfs2.h
@@ -425,6 +425,8 @@ struct ocfs2_super
 	/* rb tree root for refcount lock. */
 	struct rb_root	osb_rf_lock_tree;
 	struct ocfs2_refcount_tree *osb_ref_tree_lru;
+
+	unsigned int truncated_clusters;
 };
 
 #define OCFS2_SB(sb)	    ((struct ocfs2_super *)(sb)->s_fs_info)
-- 
1.7.1.571.gba4d01

Tao Ma

2010-Sep-16 06:21 UTC

head link

[Ocfs2-devel] [PATCH 2/3] ocfs2: Start ocfs2cmt even in local mode.

Currently in local mode, we don't start ocfs2cmt thread,
so all the journal flush work is done by jbd2.

But we now need to flush the journal if the truncate log has
collected too much clusters because in this case all the bits
which hasn't been committed can't be allocated again(check
function ocfs2_test_bg_bit_allocatable for details).

This patch just reverts some modification of commit c271c5c
so that now we can start journal commit by ourselves by calling
ocfs2_start_checkpoint. The changes include:
1. Start ocfs2cmt no matter whether we are in local mode or not.
2. Remove the 'local' parameter from ocfs2_journal_load since now
   we don't care.
3. Always increase j_num_trans in ocfs2_start_trans to make commit
   thread work.
4. Revert the change in ocfs2_journal_shutdown.

Signed-off-by: Tao Ma <tao.ma at oracle.com>
---
 fs/ocfs2/journal.c |   45 ++++++++++++---------------------------------
 fs/ocfs2/journal.h |    3 +--
 fs/ocfs2/super.c   |    5 +----
 3 files changed, 14 insertions(+), 39 deletions(-)

diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
index 9b57c03..e927532 100644
--- a/fs/ocfs2/journal.c
+++ b/fs/ocfs2/journal.c
@@ -374,10 +374,8 @@ handle_t *ocfs2_start_trans(struct ocfs2_super *osb, int
max_buffs)
 			ocfs2_abort(osb->sb, "Detected aborted journal");
 			handle = ERR_PTR(-EROFS);
 		}
-	} else {
-		if (!ocfs2_mount_local(osb))
-			atomic_inc(&(osb->journal->j_num_trans));
-	}
+	} else
+		atomic_inc(&(osb->journal->j_num_trans));
 
 	return handle;
 }
@@ -966,23 +964,9 @@ void ocfs2_journal_shutdown(struct ocfs2_super *osb)
 
 	BUG_ON(atomic_read(&(osb->journal->j_num_trans)) != 0);
 
-	if (ocfs2_mount_local(osb)) {
-		jbd2_journal_lock_updates(journal->j_journal);
-		status = jbd2_journal_flush(journal->j_journal);
-		jbd2_journal_unlock_updates(journal->j_journal);
-		if (status < 0)
-			mlog_errno(status);
-	}
-
-	if (status == 0) {
-		/*
-		 * Do not toggle if flush was unsuccessful otherwise
-		 * will leave dirty metadata in a "clean" journal
-		 */
-		status = ocfs2_journal_toggle_dirty(osb, 0, 0);
-		if (status < 0)
-			mlog_errno(status);
-	}
+	status = ocfs2_journal_toggle_dirty(osb, 0, 0);
+	if (status < 0)
+		mlog_errno(status);
 
 	/* Shutdown the kernel journal system */
 	jbd2_journal_destroy(journal->j_journal);
@@ -1023,7 +1007,7 @@ static void ocfs2_clear_journal_error(struct super_block
*sb,
 	}
 }
 
-int ocfs2_journal_load(struct ocfs2_journal *journal, int local, int replayed)
+int ocfs2_journal_load(struct ocfs2_journal *journal, int replayed)
 {
 	int status = 0;
 	struct ocfs2_super *osb;
@@ -1049,18 +1033,13 @@ int ocfs2_journal_load(struct ocfs2_journal *journal,
int local, int replayed)
 	}
 
 	/* Launch the commit thread */
-	if (!local) {
-		osb->commit_task = kthread_run(ocfs2_commit_thread, osb,
-					       "ocfs2cmt");
-		if (IS_ERR(osb->commit_task)) {
-			status = PTR_ERR(osb->commit_task);
-			osb->commit_task = NULL;
-			mlog(ML_ERROR, "unable to launch ocfs2commit thread, "
-			     "error=%d", status);
-			goto done;
-		}
-	} else
+	osb->commit_task = kthread_run(ocfs2_commit_thread, osb,
"ocfs2cmt");
+	if (IS_ERR(osb->commit_task)) {
+		status = PTR_ERR(osb->commit_task);
 		osb->commit_task = NULL;
+		mlog(ML_ERROR, "unable to launch ocfs2commit thread, "
+		     "error=%d", status);
+	}
 
 done:
 	mlog_exit(status);
diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h
index b5baaa8..b960d5a 100644
--- a/fs/ocfs2/journal.h
+++ b/fs/ocfs2/journal.h
@@ -188,8 +188,7 @@ int    ocfs2_journal_init(struct ocfs2_journal *journal,
 void   ocfs2_journal_shutdown(struct ocfs2_super *osb);
 int    ocfs2_journal_wipe(struct ocfs2_journal *journal,
 			  int full);
-int    ocfs2_journal_load(struct ocfs2_journal *journal, int local,
-			  int replayed);
+int    ocfs2_journal_load(struct ocfs2_journal *journal, int replayed);
 int    ocfs2_check_journals_nolocks(struct ocfs2_super *osb);
 void   ocfs2_recovery_thread(struct ocfs2_super *osb,
 			     int node_num);
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index fa1be1b..3aa44cf 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -2366,7 +2366,6 @@ static int ocfs2_check_volume(struct ocfs2_super *osb)
 {
 	int status;
 	int dirty;
-	int local;
 	struct ocfs2_dinode *local_alloc = NULL; /* only used if we
 						  * recover
 						  * ourselves. */
@@ -2394,10 +2393,8 @@ static int ocfs2_check_volume(struct ocfs2_super *osb)
 		     "recovering volume.\n");
 	}
 
-	local = ocfs2_mount_local(osb);
-
 	/* will play back anything left in the journal. */
-	status = ocfs2_journal_load(osb->journal, local, dirty);
+	status = ocfs2_journal_load(osb->journal, dirty);
 	if (status < 0) {
 		mlog(ML_ERROR, "ocfs2 journal load failed! %d\n", status);
 		goto finally;
-- 
1.7.1.571.gba4d01

Tao Ma

2010-Sep-16 06:22 UTC

head link

[Ocfs2-devel] [PATCH 3/3] ocfs2: Start journal checkpoint if we have too many truncated clusters.

Add a new para 'checkpoint' in ocfs2_truncate_log_needs_flush,
if it finds we need to checkpoint the journal, start it.

For xattr truncate, I don't pass the parameter now since the value
now has a limit of 64K, so I don't think we need a checkpoint there.

Signed-off-by: Tao Ma <tao.ma at oracle.com>
---
 fs/ocfs2/alloc.c |   25 ++++++++++++++++++-------
 fs/ocfs2/alloc.h |    2 +-
 fs/ocfs2/xattr.c |    4 ++--
 3 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index c765447..218bc2b 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -5646,7 +5646,7 @@ int ocfs2_remove_btree_range(struct inode *inode,
 			     struct ocfs2_cached_dealloc_ctxt *dealloc,
 			     u64 refcount_loc)
 {
-	int ret, credits = 0, extra_blocks = 0;
+	int ret, credits = 0, extra_blocks = 0, checkpoint = 0;
 	u64 phys_blkno = ocfs2_clusters_to_blocks(inode->i_sb, phys_cpos);
 	struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
 	struct inode *tl_inode = osb->osb_tl_inode;
@@ -5686,7 +5686,7 @@ int ocfs2_remove_btree_range(struct inode *inode,
 
 	mutex_lock(&tl_inode->i_mutex);
 
-	if (ocfs2_truncate_log_needs_flush(osb)) {
+	if (ocfs2_truncate_log_needs_flush(osb, &checkpoint)) {
 		ret = __ocfs2_flush_truncate_log(osb);
 		if (ret < 0) {
 			mlog_errno(ret);
@@ -5748,11 +5748,14 @@ out:
 	if (ref_tree)
 		ocfs2_unlock_refcount_tree(osb, ref_tree, 1);
 
+	if (!ret && checkpoint)
+		ocfs2_start_checkpoint(osb);
+
 	return ret;
 }
 
 #define FLUSH_TRUNCATE_LOG_RATIO	10
-int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb)
+int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb, int *checkpoint)
 {
 	struct buffer_head *tl_bh = osb->osb_tl_bh;
 	struct ocfs2_dinode *di;
@@ -5781,6 +5784,9 @@ int ocfs2_truncate_log_needs_flush(struct ocfs2_super
*osb)
 			     " we have %u clusters truncated\n",
 			     watermark, osb->truncated_clusters);
 			flush = 1;
+
+			if (checkpoint)
+				*checkpoint = 1;
 		}
 	}
 
@@ -6187,7 +6193,7 @@ bail:
 int ocfs2_complete_truncate_log_recovery(struct ocfs2_super *osb,
 					 struct ocfs2_dinode *tl_copy)
 {
-	int status = 0;
+	int status = 0, checkpoint = 0;
 	int i;
 	unsigned int clusters, num_recs, start_cluster;
 	u64 start_blk;
@@ -6209,7 +6215,7 @@ int ocfs2_complete_truncate_log_recovery(struct
ocfs2_super *osb,
 
 	mutex_lock(&tl_inode->i_mutex);
 	for(i = 0; i < num_recs; i++) {
-		if (ocfs2_truncate_log_needs_flush(osb)) {
+		if (ocfs2_truncate_log_needs_flush(osb, &checkpoint)) {
 			status = __ocfs2_flush_truncate_log(osb);
 			if (status < 0) {
 				mlog_errno(status);
@@ -6240,6 +6246,9 @@ int ocfs2_complete_truncate_log_recovery(struct
ocfs2_super *osb,
 bail_up:
 	mutex_unlock(&tl_inode->i_mutex);
 
+	if (!status && checkpoint)
+		ocfs2_start_checkpoint(osb);
+
 	mlog_exit(status);
 	return status;
 }
@@ -6442,12 +6451,12 @@ static int ocfs2_free_cached_clusters(struct ocfs2_super
*osb,
 	struct ocfs2_cached_block_free *tmp;
 	struct inode *tl_inode = osb->osb_tl_inode;
 	handle_t *handle;
-	int ret = 0;
+	int ret = 0, checkpoint = 0;
 
 	mutex_lock(&tl_inode->i_mutex);
 
 	while (head) {
-		if (ocfs2_truncate_log_needs_flush(osb)) {
+		if (ocfs2_truncate_log_needs_flush(osb, &checkpoint)) {
 			ret = __ocfs2_flush_truncate_log(osb);
 			if (ret < 0) {
 				mlog_errno(ret);
@@ -6485,6 +6494,8 @@ static int ocfs2_free_cached_clusters(struct ocfs2_super
*osb,
 		kfree(tmp);
 	}
 
+	if (!ret && checkpoint)
+		ocfs2_start_checkpoint(osb);
 	return ret;
 }
 
diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h
index 55762b5..967ee21 100644
--- a/fs/ocfs2/alloc.h
+++ b/fs/ocfs2/alloc.h
@@ -182,7 +182,7 @@ int ocfs2_begin_truncate_log_recovery(struct ocfs2_super
*osb,
 				      struct ocfs2_dinode **tl_copy);
 int ocfs2_complete_truncate_log_recovery(struct ocfs2_super *osb,
 					 struct ocfs2_dinode *tl_copy);
-int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb);
+int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb, int *checkpoint);
 int ocfs2_truncate_log_append(struct ocfs2_super *osb,
 			      handle_t *handle,
 			      u64 start_blk,
diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c
index d03469f..82f1fa2 100644
--- a/fs/ocfs2/xattr.c
+++ b/fs/ocfs2/xattr.c
@@ -3595,7 +3595,7 @@ int ocfs2_xattr_set(struct inode *inode,
 
 	mutex_lock(&tl_inode->i_mutex);
 
-	if (ocfs2_truncate_log_needs_flush(osb)) {
+	if (ocfs2_truncate_log_needs_flush(osb, NULL)) {
 		ret = __ocfs2_flush_truncate_log(osb);
 		if (ret < 0) {
 			mutex_unlock(&tl_inode->i_mutex);
@@ -5447,7 +5447,7 @@ static int ocfs2_rm_xattr_cluster(struct inode *inode,
 
 	mutex_lock(&tl_inode->i_mutex);
 
-	if (ocfs2_truncate_log_needs_flush(osb)) {
+	if (ocfs2_truncate_log_needs_flush(osb, NULL)) {
 		ret = __ocfs2_flush_truncate_log(osb);
 		if (ret < 0) {
 			mlog_errno(ret);
-- 
1.7.1.571.gba4d01

Joel Becker

2010-Sep-16 06:41 UTC

head link

[Ocfs2-devel] [PATCH 0/3] ocfs2: Resolve the problem of truncate log flush.

On Thu, Sep 16, 2010 at 02:19:51PM +0800, Tao Ma wrote:> 0002-0003 are a little complicate here.
> So in general, even we have freed the clusters to the global
> bitmap(by the above flush), we can't use them because they haven't
> been committed to the disk(check the comments above
> ocfs2_test_bg_bit_allocatable for details). So we have to tell jbd2
> to flush immediately.
Mark,
	Didn't we do something to allow our allocation to re-use these
clusters?  Or was that for "thought to be allocated but never actually
used" clusters?
> 0002 reverts some change in commit c271c5c so that now even in local
> mode we can force jbd2 to flush immediately.
> 0003 add the 'checkpoint' to ocfs2_truncate_log_needs_flush, so
that
> the callers know that we need to checkpoint after the flush.
> 
> Actually 0003 can work without 0002 since in cluster env,
> ocfs2_start_checkpoint will work and 0002 is just used for a local
> mode. So feel free to drop 0002 if we decide to still keep ocfs2cmt
> out of local mode.
	Assuming we want to force jbd2 out, which is heavy as you point
out, can't we just sync the filesystem?  Wouldn't that work without
adding ocfs2cmt?

Joel

-- 

"I am working for the time when unqualified blacks, browns, and
 women join the unqualified men in running our government."
	- Sissy Farenthold

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127

Ocfs2 devel - Sep 2010 - [PATCH 0/3] ocfs2: Resolve the problem of truncate log flush.

[Ocfs2-devel] [PATCH 0/3] ocfs2: Resolve the problem of truncate log flush.

[Ocfs2-devel] [PATCH 1/3] ocfs2: flush truncate log in case it contains too many clusters.

[Ocfs2-devel] [PATCH 2/3] ocfs2: Start ocfs2cmt even in local mode.

[Ocfs2-devel] [PATCH 3/3] ocfs2: Start journal checkpoint if we have too many truncated clusters.

[Ocfs2-devel] [PATCH 0/3] ocfs2: Resolve the problem of truncate log flush.