Tao Ma
2010-Sep-16 06:19 UTC
[Ocfs2-devel] [PATCH 0/3] ocfs2: Resolve the problem of truncate log flush.
Hi all, Recently, one of our colleagues meet with a problem that if we write/delete a 32mb files repeatly, we will get a ENOSPC in the end. And the corresponding bug is 1288. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1288 So this patch set just tries to resolve it. It includes 3 patches: 0001 is a separate one. It just adds a new watermark for truncate log, FLUSH_TRUNCATE_LOG_RATIO. So if the truncate log has collected too much clusters, ocfs2_truncate_log_needs_flush will tell the caller to flush immediately. 0002-0003 are a little complicate here. So in general, even we have freed the clusters to the global bitmap(by the above flush), we can't use them because they haven't been committed to the disk(check the comments above ocfs2_test_bg_bit_allocatable for details). So we have to tell jbd2 to flush immediately. 0002 reverts some change in commit c271c5c so that now even in local mode we can force jbd2 to flush immediately. 0003 add the 'checkpoint' to ocfs2_truncate_log_needs_flush, so that the callers know that we need to checkpoint after the flush. Actually 0003 can work without 0002 since in cluster env, ocfs2_start_checkpoint will work and 0002 is just used for a local mode. So feel free to drop 0002 if we decide to still keep ocfs2cmt out of local mode. Regards, Tao
Tao Ma
2010-Sep-16 06:21 UTC
[Ocfs2-devel] [PATCH 1/3] ocfs2: flush truncate log in case it contains too many clusters.
When we test whether we need to flush truncate log in ocfs2_truncate_log_needs_flush, we only take care of whether the truncate log is full. But if the volume is small and we have large block size(in this case truncate log can store too many records), we may be too late for flushing if the user create/write/delete files quickly. So I add a new FLUSH_TRUNCATE_LOG_RATIO so that we will also check whether the number of truncated clusters has reached a watermark, if yes, flush the truncate log. It resolves the ossbug #1288 somehow. Signed-off-by: Tao Ma <tao.ma at oracle.com> --- fs/ocfs2/alloc.c | 25 ++++++++++++++++++++++++- fs/ocfs2/ocfs2.h | 2 ++ 2 files changed, 26 insertions(+), 1 deletions(-) diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c index 592fae5..c765447 100644 --- a/fs/ocfs2/alloc.c +++ b/fs/ocfs2/alloc.c @@ -5751,11 +5751,13 @@ out: return ret; } +#define FLUSH_TRUNCATE_LOG_RATIO 10 int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb) { struct buffer_head *tl_bh = osb->osb_tl_bh; struct ocfs2_dinode *di; struct ocfs2_truncate_log *tl; + int flush = 0; di = (struct ocfs2_dinode *) tl_bh->b_data; tl = &di->id2.i_dealloc; @@ -5764,7 +5766,25 @@ int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb) "slot %d, invalid truncate log parameters: used = " "%u, count = %u\n", osb->slot_num, le16_to_cpu(tl->tl_used), le16_to_cpu(tl->tl_count)); - return le16_to_cpu(tl->tl_used) == le16_to_cpu(tl->tl_count); + if (le16_to_cpu(tl->tl_used) == le16_to_cpu(tl->tl_count)) + flush = 1; + else { + /* + * Check whether we have reserved enough clusters + * to flush. + */ + u32 watermark = osb->osb_clusters_at_boot / 100 * + FLUSH_TRUNCATE_LOG_RATIO; + + if (watermark < osb->truncated_clusters) { + mlog(0, "flush truncate log: watermark %u," + " we have %u clusters truncated\n", + watermark, osb->truncated_clusters); + flush = 1; + } + } + + return flush; } static int ocfs2_truncate_log_can_coalesce(struct ocfs2_truncate_log *tl, @@ -5858,6 +5878,7 @@ int ocfs2_truncate_log_append(struct ocfs2_super *osb, ocfs2_journal_dirty(handle, tl_bh); + osb->truncated_clusters += num_clusters; bail: mlog_exit(status); return status; @@ -5929,6 +5950,8 @@ static int ocfs2_replay_truncate_records(struct ocfs2_super *osb, i--; } + osb->truncated_clusters = 0; + bail: mlog_exit(status); return status; diff --git a/fs/ocfs2/ocfs2.h b/fs/ocfs2/ocfs2.h index c67003b..5f47883 100644 --- a/fs/ocfs2/ocfs2.h +++ b/fs/ocfs2/ocfs2.h @@ -425,6 +425,8 @@ struct ocfs2_super /* rb tree root for refcount lock. */ struct rb_root osb_rf_lock_tree; struct ocfs2_refcount_tree *osb_ref_tree_lru; + + unsigned int truncated_clusters; }; #define OCFS2_SB(sb) ((struct ocfs2_super *)(sb)->s_fs_info) -- 1.7.1.571.gba4d01
Tao Ma
2010-Sep-16 06:21 UTC
[Ocfs2-devel] [PATCH 2/3] ocfs2: Start ocfs2cmt even in local mode.
Currently in local mode, we don't start ocfs2cmt thread, so all the journal flush work is done by jbd2. But we now need to flush the journal if the truncate log has collected too much clusters because in this case all the bits which hasn't been committed can't be allocated again(check function ocfs2_test_bg_bit_allocatable for details). This patch just reverts some modification of commit c271c5c so that now we can start journal commit by ourselves by calling ocfs2_start_checkpoint. The changes include: 1. Start ocfs2cmt no matter whether we are in local mode or not. 2. Remove the 'local' parameter from ocfs2_journal_load since now we don't care. 3. Always increase j_num_trans in ocfs2_start_trans to make commit thread work. 4. Revert the change in ocfs2_journal_shutdown. Signed-off-by: Tao Ma <tao.ma at oracle.com> --- fs/ocfs2/journal.c | 45 ++++++++++++--------------------------------- fs/ocfs2/journal.h | 3 +-- fs/ocfs2/super.c | 5 +---- 3 files changed, 14 insertions(+), 39 deletions(-) diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index 9b57c03..e927532 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -374,10 +374,8 @@ handle_t *ocfs2_start_trans(struct ocfs2_super *osb, int max_buffs) ocfs2_abort(osb->sb, "Detected aborted journal"); handle = ERR_PTR(-EROFS); } - } else { - if (!ocfs2_mount_local(osb)) - atomic_inc(&(osb->journal->j_num_trans)); - } + } else + atomic_inc(&(osb->journal->j_num_trans)); return handle; } @@ -966,23 +964,9 @@ void ocfs2_journal_shutdown(struct ocfs2_super *osb) BUG_ON(atomic_read(&(osb->journal->j_num_trans)) != 0); - if (ocfs2_mount_local(osb)) { - jbd2_journal_lock_updates(journal->j_journal); - status = jbd2_journal_flush(journal->j_journal); - jbd2_journal_unlock_updates(journal->j_journal); - if (status < 0) - mlog_errno(status); - } - - if (status == 0) { - /* - * Do not toggle if flush was unsuccessful otherwise - * will leave dirty metadata in a "clean" journal - */ - status = ocfs2_journal_toggle_dirty(osb, 0, 0); - if (status < 0) - mlog_errno(status); - } + status = ocfs2_journal_toggle_dirty(osb, 0, 0); + if (status < 0) + mlog_errno(status); /* Shutdown the kernel journal system */ jbd2_journal_destroy(journal->j_journal); @@ -1023,7 +1007,7 @@ static void ocfs2_clear_journal_error(struct super_block *sb, } } -int ocfs2_journal_load(struct ocfs2_journal *journal, int local, int replayed) +int ocfs2_journal_load(struct ocfs2_journal *journal, int replayed) { int status = 0; struct ocfs2_super *osb; @@ -1049,18 +1033,13 @@ int ocfs2_journal_load(struct ocfs2_journal *journal, int local, int replayed) } /* Launch the commit thread */ - if (!local) { - osb->commit_task = kthread_run(ocfs2_commit_thread, osb, - "ocfs2cmt"); - if (IS_ERR(osb->commit_task)) { - status = PTR_ERR(osb->commit_task); - osb->commit_task = NULL; - mlog(ML_ERROR, "unable to launch ocfs2commit thread, " - "error=%d", status); - goto done; - } - } else + osb->commit_task = kthread_run(ocfs2_commit_thread, osb, "ocfs2cmt"); + if (IS_ERR(osb->commit_task)) { + status = PTR_ERR(osb->commit_task); osb->commit_task = NULL; + mlog(ML_ERROR, "unable to launch ocfs2commit thread, " + "error=%d", status); + } done: mlog_exit(status); diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h index b5baaa8..b960d5a 100644 --- a/fs/ocfs2/journal.h +++ b/fs/ocfs2/journal.h @@ -188,8 +188,7 @@ int ocfs2_journal_init(struct ocfs2_journal *journal, void ocfs2_journal_shutdown(struct ocfs2_super *osb); int ocfs2_journal_wipe(struct ocfs2_journal *journal, int full); -int ocfs2_journal_load(struct ocfs2_journal *journal, int local, - int replayed); +int ocfs2_journal_load(struct ocfs2_journal *journal, int replayed); int ocfs2_check_journals_nolocks(struct ocfs2_super *osb); void ocfs2_recovery_thread(struct ocfs2_super *osb, int node_num); diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index fa1be1b..3aa44cf 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -2366,7 +2366,6 @@ static int ocfs2_check_volume(struct ocfs2_super *osb) { int status; int dirty; - int local; struct ocfs2_dinode *local_alloc = NULL; /* only used if we * recover * ourselves. */ @@ -2394,10 +2393,8 @@ static int ocfs2_check_volume(struct ocfs2_super *osb) "recovering volume.\n"); } - local = ocfs2_mount_local(osb); - /* will play back anything left in the journal. */ - status = ocfs2_journal_load(osb->journal, local, dirty); + status = ocfs2_journal_load(osb->journal, dirty); if (status < 0) { mlog(ML_ERROR, "ocfs2 journal load failed! %d\n", status); goto finally; -- 1.7.1.571.gba4d01
Tao Ma
2010-Sep-16 06:22 UTC
[Ocfs2-devel] [PATCH 3/3] ocfs2: Start journal checkpoint if we have too many truncated clusters.
Add a new para 'checkpoint' in ocfs2_truncate_log_needs_flush, if it finds we need to checkpoint the journal, start it. For xattr truncate, I don't pass the parameter now since the value now has a limit of 64K, so I don't think we need a checkpoint there. Signed-off-by: Tao Ma <tao.ma at oracle.com> --- fs/ocfs2/alloc.c | 25 ++++++++++++++++++------- fs/ocfs2/alloc.h | 2 +- fs/ocfs2/xattr.c | 4 ++-- 3 files changed, 21 insertions(+), 10 deletions(-) diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c index c765447..218bc2b 100644 --- a/fs/ocfs2/alloc.c +++ b/fs/ocfs2/alloc.c @@ -5646,7 +5646,7 @@ int ocfs2_remove_btree_range(struct inode *inode, struct ocfs2_cached_dealloc_ctxt *dealloc, u64 refcount_loc) { - int ret, credits = 0, extra_blocks = 0; + int ret, credits = 0, extra_blocks = 0, checkpoint = 0; u64 phys_blkno = ocfs2_clusters_to_blocks(inode->i_sb, phys_cpos); struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); struct inode *tl_inode = osb->osb_tl_inode; @@ -5686,7 +5686,7 @@ int ocfs2_remove_btree_range(struct inode *inode, mutex_lock(&tl_inode->i_mutex); - if (ocfs2_truncate_log_needs_flush(osb)) { + if (ocfs2_truncate_log_needs_flush(osb, &checkpoint)) { ret = __ocfs2_flush_truncate_log(osb); if (ret < 0) { mlog_errno(ret); @@ -5748,11 +5748,14 @@ out: if (ref_tree) ocfs2_unlock_refcount_tree(osb, ref_tree, 1); + if (!ret && checkpoint) + ocfs2_start_checkpoint(osb); + return ret; } #define FLUSH_TRUNCATE_LOG_RATIO 10 -int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb) +int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb, int *checkpoint) { struct buffer_head *tl_bh = osb->osb_tl_bh; struct ocfs2_dinode *di; @@ -5781,6 +5784,9 @@ int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb) " we have %u clusters truncated\n", watermark, osb->truncated_clusters); flush = 1; + + if (checkpoint) + *checkpoint = 1; } } @@ -6187,7 +6193,7 @@ bail: int ocfs2_complete_truncate_log_recovery(struct ocfs2_super *osb, struct ocfs2_dinode *tl_copy) { - int status = 0; + int status = 0, checkpoint = 0; int i; unsigned int clusters, num_recs, start_cluster; u64 start_blk; @@ -6209,7 +6215,7 @@ int ocfs2_complete_truncate_log_recovery(struct ocfs2_super *osb, mutex_lock(&tl_inode->i_mutex); for(i = 0; i < num_recs; i++) { - if (ocfs2_truncate_log_needs_flush(osb)) { + if (ocfs2_truncate_log_needs_flush(osb, &checkpoint)) { status = __ocfs2_flush_truncate_log(osb); if (status < 0) { mlog_errno(status); @@ -6240,6 +6246,9 @@ int ocfs2_complete_truncate_log_recovery(struct ocfs2_super *osb, bail_up: mutex_unlock(&tl_inode->i_mutex); + if (!status && checkpoint) + ocfs2_start_checkpoint(osb); + mlog_exit(status); return status; } @@ -6442,12 +6451,12 @@ static int ocfs2_free_cached_clusters(struct ocfs2_super *osb, struct ocfs2_cached_block_free *tmp; struct inode *tl_inode = osb->osb_tl_inode; handle_t *handle; - int ret = 0; + int ret = 0, checkpoint = 0; mutex_lock(&tl_inode->i_mutex); while (head) { - if (ocfs2_truncate_log_needs_flush(osb)) { + if (ocfs2_truncate_log_needs_flush(osb, &checkpoint)) { ret = __ocfs2_flush_truncate_log(osb); if (ret < 0) { mlog_errno(ret); @@ -6485,6 +6494,8 @@ static int ocfs2_free_cached_clusters(struct ocfs2_super *osb, kfree(tmp); } + if (!ret && checkpoint) + ocfs2_start_checkpoint(osb); return ret; } diff --git a/fs/ocfs2/alloc.h b/fs/ocfs2/alloc.h index 55762b5..967ee21 100644 --- a/fs/ocfs2/alloc.h +++ b/fs/ocfs2/alloc.h @@ -182,7 +182,7 @@ int ocfs2_begin_truncate_log_recovery(struct ocfs2_super *osb, struct ocfs2_dinode **tl_copy); int ocfs2_complete_truncate_log_recovery(struct ocfs2_super *osb, struct ocfs2_dinode *tl_copy); -int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb); +int ocfs2_truncate_log_needs_flush(struct ocfs2_super *osb, int *checkpoint); int ocfs2_truncate_log_append(struct ocfs2_super *osb, handle_t *handle, u64 start_blk, diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c index d03469f..82f1fa2 100644 --- a/fs/ocfs2/xattr.c +++ b/fs/ocfs2/xattr.c @@ -3595,7 +3595,7 @@ int ocfs2_xattr_set(struct inode *inode, mutex_lock(&tl_inode->i_mutex); - if (ocfs2_truncate_log_needs_flush(osb)) { + if (ocfs2_truncate_log_needs_flush(osb, NULL)) { ret = __ocfs2_flush_truncate_log(osb); if (ret < 0) { mutex_unlock(&tl_inode->i_mutex); @@ -5447,7 +5447,7 @@ static int ocfs2_rm_xattr_cluster(struct inode *inode, mutex_lock(&tl_inode->i_mutex); - if (ocfs2_truncate_log_needs_flush(osb)) { + if (ocfs2_truncate_log_needs_flush(osb, NULL)) { ret = __ocfs2_flush_truncate_log(osb); if (ret < 0) { mlog_errno(ret); -- 1.7.1.571.gba4d01
Joel Becker
2010-Sep-16 06:41 UTC
[Ocfs2-devel] [PATCH 0/3] ocfs2: Resolve the problem of truncate log flush.
On Thu, Sep 16, 2010 at 02:19:51PM +0800, Tao Ma wrote:> 0002-0003 are a little complicate here. > So in general, even we have freed the clusters to the global > bitmap(by the above flush), we can't use them because they haven't > been committed to the disk(check the comments above > ocfs2_test_bg_bit_allocatable for details). So we have to tell jbd2 > to flush immediately.Mark, Didn't we do something to allow our allocation to re-use these clusters? Or was that for "thought to be allocated but never actually used" clusters?> 0002 reverts some change in commit c271c5c so that now even in local > mode we can force jbd2 to flush immediately. > 0003 add the 'checkpoint' to ocfs2_truncate_log_needs_flush, so that > the callers know that we need to checkpoint after the flush. > > Actually 0003 can work without 0002 since in cluster env, > ocfs2_start_checkpoint will work and 0002 is just used for a local > mode. So feel free to drop 0002 if we decide to still keep ocfs2cmt > out of local mode.Assuming we want to force jbd2 out, which is heavy as you point out, can't we just sync the filesystem? Wouldn't that work without adding ocfs2cmt? Joel -- "I am working for the time when unqualified blacks, browns, and women join the unqualified men in running our government." - Sissy Farenthold Joel Becker Consulting Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127