thr3ads.net - Ocfs2 devel - [Ocfs2-devel] [PATCH v2 2/2] ocfs2: add trimfs lock to avoid duplicated trims in cluster [Jan 2018]

If this information is useful, please help other people find it:
Share via:

Gang He

2017-Dec-14 05:14 UTC

[Ocfs2-devel] [PATCH v2 2/2] ocfs2: add trimfs lock to avoid duplicated trims in cluster

As you know, ocfs2 has support trim the underlying disk via
fstrim command. But there is a problem, ocfs2 is a shared disk
cluster file system, if the user configures a scheduled fstrim
job on each file system node, this will trigger multiple nodes
trim a shared disk simultaneously, it is very wasteful for CPU
and IO consumption, also might negatively affect the lifetime
of poor-quality SSD devices.
Then, we introduce a trimfs dlm lock to communicate with each
other in this case, which will make only one fstrim command to
do the trimming on a shared disk among the cluster, the fstrim
commands from the other nodes should wait for the first fstrim
to finish and returned success directly, to avoid running a the
same trim on the shared disk again.

Compare with first version, I change the fstrim commands' returned
value and behavior in case which meets a fstrim command is running
on a shared disk.

Signed-off-by: Gang He <ghe at suse.com>
---
 fs/ocfs2/alloc.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
index ab5105f..5c9c3e2 100644
--- a/fs/ocfs2/alloc.c
+++ b/fs/ocfs2/alloc.c
@@ -7382,6 +7382,7 @@ int ocfs2_trim_fs(struct super_block *sb, struct
fstrim_range *range)
 	struct buffer_head *gd_bh = NULL;
 	struct ocfs2_dinode *main_bm;
 	struct ocfs2_group_desc *gd = NULL;
+	struct ocfs2_trim_fs_info info, *pinfo = NULL;
 
 	start = range->start >> osb->s_clustersize_bits;
 	len = range->len >> osb->s_clustersize_bits;
@@ -7419,6 +7420,42 @@ int ocfs2_trim_fs(struct super_block *sb, struct
fstrim_range *range)
 
 	trace_ocfs2_trim_fs(start, len, minlen);
 
+	ocfs2_trim_fs_lock_res_init(osb);
+	ret = ocfs2_trim_fs_lock(osb, NULL, 1);
+	if (ret < 0) {
+		if (ret != -EAGAIN) {
+			mlog_errno(ret);
+			ocfs2_trim_fs_lock_res_uninit(osb);
+			goto out_unlock;
+		}
+
+		mlog(ML_NOTICE, "Wait for trim on device (%s) to "
+		     "finish, which is running from another node.\n",
+		     osb->dev_str);
+		ret = ocfs2_trim_fs_lock(osb, &info, 0);
+		if (ret < 0) {
+			mlog_errno(ret);
+			ocfs2_trim_fs_lock_res_uninit(osb);
+			goto out_unlock;
+		}
+
+		if (info.tf_valid && info.tf_success &&
+		    info.tf_start == start && info.tf_len == len &&
+		    info.tf_minlen == minlen) {
+			/* Avoid sending duplicated trim to a shared device */
+			mlog(ML_NOTICE, "The same trim on device (%s) was "
+			     "just done from node (%u), return.\n",
+			     osb->dev_str, info.tf_nodenum);
+			range->len = info.tf_trimlen;
+			goto out_trimunlock;
+		}
+	}
+
+	info.tf_nodenum = osb->node_num;
+	info.tf_start = start;
+	info.tf_len = len;
+	info.tf_minlen = minlen;
+
 	/* Determine first and last group to examine based on start and len */
 	first_group = ocfs2_which_cluster_group(main_bm_inode, start);
 	if (first_group == osb->first_cluster_group_blkno)
@@ -7463,6 +7500,13 @@ int ocfs2_trim_fs(struct super_block *sb, struct
fstrim_range *range)
 			group += ocfs2_clusters_to_blocks(sb, osb->bitmap_cpg);
 	}
 	range->len = trimmed * sb->s_blocksize;
+
+	info.tf_trimlen = range->len;
+	info.tf_success = (ret ? 0 : 1);
+	pinfo = &info;
+out_trimunlock:
+	ocfs2_trim_fs_unlock(osb, pinfo);
+	ocfs2_trim_fs_lock_res_uninit(osb);
 out_unlock:
 	ocfs2_inode_unlock(main_bm_inode, 0);
 	brelse(main_bm_bh);
-- 
1.8.5.6

alex chen

2018-Jan-04 01:39 UTC

head link

[Ocfs2-devel] [PATCH v2 2/2] ocfs2: add trimfs lock to avoid duplicated trims in cluster

Hi Gang,

On 2017/12/14 13:14, Gang He wrote:> As you know, ocfs2 has support trim the underlying disk via
> fstrim command. But there is a problem, ocfs2 is a shared disk
> cluster file system, if the user configures a scheduled fstrim
> job on each file system node, this will trigger multiple nodes
> trim a shared disk simultaneously, it is very wasteful for CPU
> and IO consumption, also might negatively affect the lifetime
> of poor-quality SSD devices.
> Then, we introduce a trimfs dlm lock to communicate with each
> other in this case, which will make only one fstrim command to
> do the trimming on a shared disk among the cluster, the fstrim
> commands from the other nodes should wait for the first fstrim
> to finish and returned success directly, to avoid running a the
> same trim on the shared disk again.
> For the same purpose, can we take global bitmap meta lock EXMODE instead of
add a new trimfs dlm lock?

Thanks,
Alex
> Compare with first version, I change the fstrim commands' returned
> value and behavior in case which meets a fstrim command is running
> on a shared disk.
> 
> Signed-off-by: Gang He <ghe at suse.com>
> ---
>  fs/ocfs2/alloc.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
> index ab5105f..5c9c3e2 100644
> --- a/fs/ocfs2/alloc.c
> +++ b/fs/ocfs2/alloc.c
> @@ -7382,6 +7382,7 @@ int ocfs2_trim_fs(struct super_block *sb, struct
fstrim_range *range)
>  	struct buffer_head *gd_bh = NULL;
>  	struct ocfs2_dinode *main_bm;
>  	struct ocfs2_group_desc *gd = NULL;
> +	struct ocfs2_trim_fs_info info, *pinfo = NULL;
>  
>  	start = range->start >> osb->s_clustersize_bits;
>  	len = range->len >> osb->s_clustersize_bits;
> @@ -7419,6 +7420,42 @@ int ocfs2_trim_fs(struct super_block *sb, struct
fstrim_range *range)
>  
>  	trace_ocfs2_trim_fs(start, len, minlen);
>  
> +	ocfs2_trim_fs_lock_res_init(osb);
> +	ret = ocfs2_trim_fs_lock(osb, NULL, 1);
> +	if (ret < 0) {
> +		if (ret != -EAGAIN) {
> +			mlog_errno(ret);
> +			ocfs2_trim_fs_lock_res_uninit(osb);
> +			goto out_unlock;
> +		}
> +
> +		mlog(ML_NOTICE, "Wait for trim on device (%s) to "
> +		     "finish, which is running from another node.\n",
> +		     osb->dev_str);
> +		ret = ocfs2_trim_fs_lock(osb, &info, 0);
> +		if (ret < 0) {
> +			mlog_errno(ret);
> +			ocfs2_trim_fs_lock_res_uninit(osb);
> +			goto out_unlock;
> +		}
> +
> +		if (info.tf_valid && info.tf_success &&
> +		    info.tf_start == start && info.tf_len == len &&
> +		    info.tf_minlen == minlen) {
> +			/* Avoid sending duplicated trim to a shared device */
> +			mlog(ML_NOTICE, "The same trim on device (%s) was "
> +			     "just done from node (%u), return.\n",
> +			     osb->dev_str, info.tf_nodenum);
> +			range->len = info.tf_trimlen;
> +			goto out_trimunlock;
> +		}
> +	}
> +
> +	info.tf_nodenum = osb->node_num;
> +	info.tf_start = start;
> +	info.tf_len = len;
> +	info.tf_minlen = minlen;
> +
>  	/* Determine first and last group to examine based on start and len */
>  	first_group = ocfs2_which_cluster_group(main_bm_inode, start);
>  	if (first_group == osb->first_cluster_group_blkno)
> @@ -7463,6 +7500,13 @@ int ocfs2_trim_fs(struct super_block *sb, struct
fstrim_range *range)
>  			group += ocfs2_clusters_to_blocks(sb, osb->bitmap_cpg);
>  	}
>  	range->len = trimmed * sb->s_blocksize;
> +
> +	info.tf_trimlen = range->len;
> +	info.tf_success = (ret ? 0 : 1);
> +	pinfo = &info;
> +out_trimunlock:
> +	ocfs2_trim_fs_unlock(osb, pinfo);
> +	ocfs2_trim_fs_lock_res_uninit(osb);
>  out_unlock:
>  	ocfs2_inode_unlock(main_bm_inode, 0);
>  	brelse(main_bm_bh);
>

Changwei Ge

2018-Jan-10 06:33 UTC

head link

[Ocfs2-devel] [PATCH v2 2/2] ocfs2: add trimfs lock to avoid duplicated trims in cluster

Hi Gang,

On 2017/12/14 13:16, Gang He wrote:> As you know, ocfs2 has support trim the underlying disk via
> fstrim command. But there is a problem, ocfs2 is a shared disk
> cluster file system, if the user configures a scheduled fstrim
> job on each file system node, this will trigger multiple nodes
> trim a shared disk simultaneously, it is very wasteful for CPU
> and IO consumption, also might negatively affect the lifetime
> of poor-quality SSD devices.
> Then, we introduce a trimfs dlm lock to communicate with each
> other in this case, which will make only one fstrim command to
> do the trimming on a shared disk among the cluster, the fstrim
> commands from the other nodes should wait for the first fstrim
> to finish and returned success directly, to avoid running a the
> same trim on the shared disk again.
> 
> Compare with first version, I change the fstrim commands' returned
> value and behavior in case which meets a fstrim command is running
> on a shared disk.
> 
> Signed-off-by: Gang He <ghe at suse.com>
> ---
>   fs/ocfs2/alloc.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 44 insertions(+)
> 
> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
> index ab5105f..5c9c3e2 100644
> --- a/fs/ocfs2/alloc.c
> +++ b/fs/ocfs2/alloc.c
> @@ -7382,6 +7382,7 @@ int ocfs2_trim_fs(struct super_block *sb, struct
fstrim_range *range)
>   	struct buffer_head *gd_bh = NULL;
>   	struct ocfs2_dinode *main_bm;
>   	struct ocfs2_group_desc *gd = NULL;
> +	struct ocfs2_trim_fs_info info, *pinfo = NULL;
I think *pinfo* is not necessary.>   
>   	start = range->start >> osb->s_clustersize_bits;
>   	len = range->len >> osb->s_clustersize_bits;
> @@ -7419,6 +7420,42 @@ int ocfs2_trim_fs(struct super_block *sb, struct
fstrim_range *range)
>   
>   	trace_ocfs2_trim_fs(start, len, minlen);
>   
> +	ocfs2_trim_fs_lock_res_init(osb);
> +	ret = ocfs2_trim_fs_lock(osb, NULL, 1);
I don't get why try to lock here and if fails, acquire the same lock again
later but wait until granted.
Can it just acquire the _trimfs_ lock as a blocking one directly here?
> +	if (ret < 0) {
> +		if (ret != -EAGAIN) {
> +			mlog_errno(ret);
> +			ocfs2_trim_fs_lock_res_uninit(osb);
> +			goto out_unlock;
> +		}
> +
> +		mlog(ML_NOTICE, "Wait for trim on device (%s) to "
> +		     "finish, which is running from another node.\n",
> +		     osb->dev_str);
> +		ret = ocfs2_trim_fs_lock(osb, &info, 0);
> +		if (ret < 0) {
> +			mlog_errno(ret);
> +			ocfs2_trim_fs_lock_res_uninit(osb);
In ocfs2_trim_fs_lock_res_uninit(), you drop lock. But it is never granted.
Still need to drop lock resource?
> +			goto out_unlock;
> +		}
> +
> +		if (info.tf_valid && info.tf_success &&
> +		    info.tf_start == start && info.tf_len == len &&
> +		    info.tf_minlen == minlen) {
> +			/* Avoid sending duplicated trim to a shared device */
> +			mlog(ML_NOTICE, "The same trim on device (%s) was "
> +			     "just done from node (%u), return.\n",
> +			     osb->dev_str, info.tf_nodenum);
> +			range->len = info.tf_trimlen;
> +			goto out_trimunlock;
> +		}
> +	}
> +
> +	info.tf_nodenum = osb->node_num;
> +	info.tf_start = start;
> +	info.tf_len = len;
> +	info.tf_minlen = minlen;
If we faild during dong trimfs, I think we should not cache above info in LVB.
BTW, it seems that this patch is on top of  'try lock' patches which you
previously sent out.
Are they related?

Thanks,
Changwei
> +
>   	/* Determine first and last group to examine based on start and len */
>   	first_group = ocfs2_which_cluster_group(main_bm_inode, start);
>   	if (first_group == osb->first_cluster_group_blkno)
> @@ -7463,6 +7500,13 @@ int ocfs2_trim_fs(struct super_block *sb, struct
fstrim_range *range)
>   			group += ocfs2_clusters_to_blocks(sb, osb->bitmap_cpg);
>   	}
>   	range->len = trimmed * sb->s_blocksize;
> +
> +	info.tf_trimlen = range->len;
> +	info.tf_success = (ret ? 0 : 1);
> +	pinfo = &info;
> +out_trimunlock:
> +	ocfs2_trim_fs_unlock(osb, pinfo);
> +	ocfs2_trim_fs_lock_res_uninit(osb);
>   out_unlock:
>   	ocfs2_inode_unlock(main_bm_inode, 0);
>   	brelse(main_bm_bh);
>

Changwei Ge

2018-Jan-17 07:40 UTC

head link

[Ocfs2-devel] [PATCH v2 2/2] ocfs2: add trimfs lock to avoid duplicated trims in cluster

It looks good to me.

Reviewed-by: Changwei Ge <ge.changwei at h3c.com>

On 2017/12/14 13:16, Gang He wrote:> As you know, ocfs2 has support trim the underlying disk via
> fstrim command. But there is a problem, ocfs2 is a shared disk
> cluster file system, if the user configures a scheduled fstrim
> job on each file system node, this will trigger multiple nodes
> trim a shared disk simultaneously, it is very wasteful for CPU
> and IO consumption, also might negatively affect the lifetime
> of poor-quality SSD devices.
> Then, we introduce a trimfs dlm lock to communicate with each
> other in this case, which will make only one fstrim command to
> do the trimming on a shared disk among the cluster, the fstrim
> commands from the other nodes should wait for the first fstrim
> to finish and returned success directly, to avoid running a the
> same trim on the shared disk again.
> 
> Compare with first version, I change the fstrim commands' returned
> value and behavior in case which meets a fstrim command is running
> on a shared disk.
> 
> Signed-off-by: Gang He <ghe at suse.com>
> ---
>   fs/ocfs2/alloc.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 44 insertions(+)
> 
> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
> index ab5105f..5c9c3e2 100644
> --- a/fs/ocfs2/alloc.c
> +++ b/fs/ocfs2/alloc.c
> @@ -7382,6 +7382,7 @@ int ocfs2_trim_fs(struct super_block *sb, struct
fstrim_range *range)
>   	struct buffer_head *gd_bh = NULL;
>   	struct ocfs2_dinode *main_bm;
>   	struct ocfs2_group_desc *gd = NULL;
> +	struct ocfs2_trim_fs_info info, *pinfo = NULL;
>   
>   	start = range->start >> osb->s_clustersize_bits;
>   	len = range->len >> osb->s_clustersize_bits;
> @@ -7419,6 +7420,42 @@ int ocfs2_trim_fs(struct super_block *sb, struct
fstrim_range *range)
>   
>   	trace_ocfs2_trim_fs(start, len, minlen);
>   
> +	ocfs2_trim_fs_lock_res_init(osb);
> +	ret = ocfs2_trim_fs_lock(osb, NULL, 1);
> +	if (ret < 0) {
> +		if (ret != -EAGAIN) {
> +			mlog_errno(ret);
> +			ocfs2_trim_fs_lock_res_uninit(osb);
> +			goto out_unlock;
> +		}
> +
> +		mlog(ML_NOTICE, "Wait for trim on device (%s) to "
> +		     "finish, which is running from another node.\n",
> +		     osb->dev_str);
> +		ret = ocfs2_trim_fs_lock(osb, &info, 0);
> +		if (ret < 0) {
> +			mlog_errno(ret);
> +			ocfs2_trim_fs_lock_res_uninit(osb);
> +			goto out_unlock;
> +		}
> +
> +		if (info.tf_valid && info.tf_success &&
> +		    info.tf_start == start && info.tf_len == len &&
> +		    info.tf_minlen == minlen) {
> +			/* Avoid sending duplicated trim to a shared device */
> +			mlog(ML_NOTICE, "The same trim on device (%s) was "
> +			     "just done from node (%u), return.\n",
> +			     osb->dev_str, info.tf_nodenum);
> +			range->len = info.tf_trimlen;
> +			goto out_trimunlock;
> +		}
> +	}
> +
> +	info.tf_nodenum = osb->node_num;
> +	info.tf_start = start;
> +	info.tf_len = len;
> +	info.tf_minlen = minlen;
> +
>   	/* Determine first and last group to examine based on start and len */
>   	first_group = ocfs2_which_cluster_group(main_bm_inode, start);
>   	if (first_group == osb->first_cluster_group_blkno)
> @@ -7463,6 +7500,13 @@ int ocfs2_trim_fs(struct super_block *sb, struct
fstrim_range *range)
>   			group += ocfs2_clusters_to_blocks(sb, osb->bitmap_cpg);
>   	}
>   	range->len = trimmed * sb->s_blocksize;
> +
> +	info.tf_trimlen = range->len;
> +	info.tf_success = (ret ? 0 : 1);
> +	pinfo = &info;
> +out_trimunlock:
> +	ocfs2_trim_fs_unlock(osb, pinfo);
> +	ocfs2_trim_fs_lock_res_uninit(osb);
>   out_unlock:
>   	ocfs2_inode_unlock(main_bm_inode, 0);
>   	brelse(main_bm_bh);
>

Ocfs2 devel - Jan 2018 - [PATCH v2 2/2] ocfs2: add trimfs lock to avoid duplicated trims in cluster

[Ocfs2-devel] [PATCH v2 2/2] ocfs2: add trimfs lock to avoid duplicated trims in cluster

[Ocfs2-devel] [PATCH v2 2/2] ocfs2: add trimfs lock to avoid duplicated trims in cluster

[Ocfs2-devel] [PATCH v2 2/2] ocfs2: add trimfs lock to avoid duplicated trims in cluster

[Ocfs2-devel] [PATCH v2 2/2] ocfs2: add trimfs lock to avoid duplicated trims in cluster