thr3ads.net - Btrfs devel - [PATCH v2 0/9] btrfs: Replace the btrfs

If this information is useful, please help other people find it:
Share via:

Qu Wenruo

2013-Sep-12 08:08 UTC

[PATCH v2 0/9] btrfs: Replace the btrfs_workers with kernel workqueue

Use kernel workqueue and kernel workqueue based new btrfs_workqueue_struct to
replace
the old btrfs_workers.
The main goal is to reduce the redundant codes(800 lines vs 200 lines) and
try to get benefits from the latest workqueue changes.

About the performance, the test suite I used is bonnie++,
and there seems no significant regression.

The patched kernel get the following difference vs the 3.10 kernel on an HDD
with a two-way 4cores server.(10times each and compare the average)

putc:			-0.97%
getc:			+1.48%
random_del:		+2.38%
random_create:		-2.27%
seq_del			+0.94%

Other changes are smaller than 0.5% and can be ignored.
Since the tests are not enough and maybe unstable,
any further tests are welcome.

------
Changelog:
v1->v2: In patch 2/9
  Add "ret=-ENOMEM" for some workqueue allocation in scrub.c.
  Add qgroup_rescan_workers allocation check.
------
Qu Wenruo (9):
  btrfs: Cleanup the unused struct async_sched.
  btrfs: use kernel workqueue to replace the btrfs_workers functions
  btrfs: Added btrfs_workqueue_struct implemented ordered execution
    based on kernel workqueue
  btrfs: Add high priority workqueue support for btrfs_workqueue_struct
  btrfs: Use btrfs_workqueue_struct to replace the fs_info->workers
  btrfs: Use btrfs_workqueue_struct to replace the
    fs_info->delalloc_workers
  btrfs: Replace the fs_info->submit_workers with kernel workqueue.
  btrfs: Cleanup the old btrfs workqueue
  btrfs: Replace thread_pool_size with workqueue default value

 fs/btrfs/Makefile        |   5 +-
 fs/btrfs/async-thread.c  | 714 -----------------------------------------------
 fs/btrfs/async-thread.h  | 119 --------
 fs/btrfs/bwq.c           | 136 +++++++++
 fs/btrfs/bwq.h           |  67 +++++
 fs/btrfs/ctree.h         |  46 ++-
 fs/btrfs/delayed-inode.c |   9 +-
 fs/btrfs/dev-replace.c   |   1 -
 fs/btrfs/disk-io.c       | 238 ++++++----------
 fs/btrfs/extent-tree.c   |   6 +-
 fs/btrfs/inode.c         |  57 ++--
 fs/btrfs/ordered-data.c  |  11 +-
 fs/btrfs/ordered-data.h  |   4 +-
 fs/btrfs/qgroup.c        |  16 +-
 fs/btrfs/raid56.c        |  38 ++-
 fs/btrfs/reada.c         |   8 +-
 fs/btrfs/relocation.c    |   1 -
 fs/btrfs/scrub.c         |  78 +++---
 fs/btrfs/super.c         |  41 ++-
 fs/btrfs/volumes.c       |  25 +-
 fs/btrfs/volumes.h       |   3 +-
 21 files changed, 451 insertions(+), 1172 deletions(-)
 delete mode 100644 fs/btrfs/async-thread.c
 delete mode 100644 fs/btrfs/async-thread.h
 create mode 100644 fs/btrfs/bwq.c
 create mode 100644 fs/btrfs/bwq.h

-- 
1.8.4
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo

2013-Sep-12 08:08 UTC

head link

[PATCH v2 1/9] btrfs: Cleanup the unused struct async_sched.

The struct async_sched is not used by any codes and can be removed.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/volumes.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 78b8717..12eaf89 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5031,13 +5031,6 @@ static void btrfs_end_bio(struct bio *bio, int err)
 	}
 }
 
-struct async_sched {
-	struct bio *bio;
-	int rw;
-	struct btrfs_fs_info *info;
-	struct btrfs_work work;
-};
-
 /*
  * see run_scheduled_bios for a description of why bios are collected for
  * async submit.
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo

2013-Sep-12 08:08 UTC

head link

[PATCH v2 2/9] btrfs: use kernel workqueue to replace the btrfs_workers functions

Use the kernel workqueue to replace the btrfs_workers which are only
used as normal workqueue.

Other btrfs_workers will use some extra functions like requeue, high
priority and ordered work.
These btrfs_workers will not be touched in this patch.

The followings are the untouched btrfs_workers:

generic_worker:		As the helper for other btrfs_workers
workers:		Use the ordering and high priority features
delalloc_workers: 	Use the ordering feature
submit_workers:		Use requeue feature

All other workers can be replaced using the kernel workqueue directly.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h         |  39 +++++------
 fs/btrfs/delayed-inode.c |   9 ++-
 fs/btrfs/disk-io.c       | 164 ++++++++++++++++++-----------------------------
 fs/btrfs/extent-tree.c   |   6 +-
 fs/btrfs/inode.c         |  38 +++++------
 fs/btrfs/ordered-data.c  |  11 ++--
 fs/btrfs/ordered-data.h  |   4 +-
 fs/btrfs/qgroup.c        |  16 ++---
 fs/btrfs/raid56.c        |  37 +++++------
 fs/btrfs/reada.c         |   8 +--
 fs/btrfs/scrub.c         |  84 ++++++++++++------------
 fs/btrfs/super.c         |  23 ++++---
 12 files changed, 196 insertions(+), 243 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index e795bf1..0dd6ec9 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1202,7 +1202,7 @@ struct btrfs_caching_control {
 	struct list_head list;
 	struct mutex mutex;
 	wait_queue_head_t wait;
-	struct btrfs_work work;
+	struct work_struct work;
 	struct btrfs_block_group_cache *block_group;
 	u64 progress;
 	atomic_t count;
@@ -1479,25 +1479,26 @@ struct btrfs_fs_info {
 	struct btrfs_workers generic_worker;
 	struct btrfs_workers workers;
 	struct btrfs_workers delalloc_workers;
-	struct btrfs_workers flush_workers;
-	struct btrfs_workers endio_workers;
-	struct btrfs_workers endio_meta_workers;
-	struct btrfs_workers endio_raid56_workers;
-	struct btrfs_workers rmw_workers;
-	struct btrfs_workers endio_meta_write_workers;
-	struct btrfs_workers endio_write_workers;
-	struct btrfs_workers endio_freespace_worker;
 	struct btrfs_workers submit_workers;
-	struct btrfs_workers caching_workers;
-	struct btrfs_workers readahead_workers;
+
+	struct workqueue_struct *flush_workers;
+	struct workqueue_struct *endio_workers;
+	struct workqueue_struct *endio_meta_workers;
+	struct workqueue_struct *endio_raid56_workers;
+	struct workqueue_struct *rmw_workers;
+	struct workqueue_struct *endio_meta_write_workers;
+	struct workqueue_struct *endio_write_workers;
+	struct workqueue_struct *endio_freespace_worker;
+	struct workqueue_struct *caching_workers;
+	struct workqueue_struct *readahead_workers;
 
 	/*
 	 * fixup workers take dirty pages that didn''t properly go through
 	 * the cow mechanism and make them safe to write.  It happens
 	 * for the sys_munmap function call path
 	 */
-	struct btrfs_workers fixup_workers;
-	struct btrfs_workers delayed_workers;
+	struct workqueue_struct *fixup_workers;
+	struct workqueue_struct *delayed_workers;
 	struct task_struct *transaction_kthread;
 	struct task_struct *cleaner_kthread;
 	int thread_pool_size;
@@ -1576,9 +1577,9 @@ struct btrfs_fs_info {
 	wait_queue_head_t scrub_pause_wait;
 	struct rw_semaphore scrub_super_lock;
 	int scrub_workers_refcnt;
-	struct btrfs_workers scrub_workers;
-	struct btrfs_workers scrub_wr_completion_workers;
-	struct btrfs_workers scrub_nocow_workers;
+	struct workqueue_struct *scrub_workers;
+	struct workqueue_struct *scrub_wr_completion_workers;
+	struct workqueue_struct *scrub_nocow_workers;
 
 #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
 	u32 check_integrity_print_mask;
@@ -1619,9 +1620,9 @@ struct btrfs_fs_info {
 	/* qgroup rescan items */
 	struct mutex qgroup_rescan_lock; /* protects the progress item */
 	struct btrfs_key qgroup_rescan_progress;
-	struct btrfs_workers qgroup_rescan_workers;
+	struct workqueue_struct *qgroup_rescan_workers;
 	struct completion qgroup_rescan_completion;
-	struct btrfs_work qgroup_rescan_work;
+	struct work_struct qgroup_rescan_work;
 
 	/* filesystem state */
 	unsigned long fs_state;
@@ -3542,7 +3543,7 @@ struct btrfs_delalloc_work {
 	int delay_iput;
 	struct completion completion;
 	struct list_head list;
-	struct btrfs_work work;
+	struct work_struct work;
 };
 
 struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode,
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 5615eac..2b8da0a7 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1258,10 +1258,10 @@ void btrfs_remove_delayed_node(struct inode *inode)
 struct btrfs_async_delayed_work {
 	struct btrfs_delayed_root *delayed_root;
 	int nr;
-	struct btrfs_work work;
+	struct work_struct work;
 };
 
-static void btrfs_async_run_delayed_root(struct btrfs_work *work)
+static void btrfs_async_run_delayed_root(struct work_struct *work)
 {
 	struct btrfs_async_delayed_work *async_work;
 	struct btrfs_delayed_root *delayed_root;
@@ -1359,11 +1359,10 @@ static int btrfs_wq_run_delayed_node(struct
btrfs_delayed_root *delayed_root,
 		return -ENOMEM;
 
 	async_work->delayed_root = delayed_root;
-	async_work->work.func = btrfs_async_run_delayed_root;
-	async_work->work.flags = 0;
+	INIT_WORK(&async_work->work, btrfs_async_run_delayed_root);
 	async_work->nr = nr;
 
-	btrfs_queue_worker(&root->fs_info->delayed_workers,
&async_work->work);
+	queue_work(root->fs_info->delayed_workers, &async_work->work);
 	return 0;
 }
 
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 3c2886c..d02a552 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -54,7 +54,7 @@
 #endif
 
 static struct extent_io_ops btree_extent_io_ops;
-static void end_workqueue_fn(struct btrfs_work *work);
+static void end_workqueue_fn(struct work_struct *work);
 static void free_fs_root(struct btrfs_root *root);
 static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
 				    int read_only);
@@ -86,7 +86,7 @@ struct end_io_wq {
 	int error;
 	int metadata;
 	struct list_head list;
-	struct btrfs_work work;
+	struct work_struct work;
 };
 
 /*
@@ -692,31 +692,30 @@ static void end_workqueue_bio(struct bio *bio, int err)
 
 	fs_info = end_io_wq->info;
 	end_io_wq->error = err;
-	end_io_wq->work.func = end_workqueue_fn;
-	end_io_wq->work.flags = 0;
+	INIT_WORK(&end_io_wq->work, end_workqueue_fn);
 
 	if (bio->bi_rw & REQ_WRITE) {
 		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA)
-			btrfs_queue_worker(&fs_info->endio_meta_write_workers,
-					   &end_io_wq->work);
+			queue_work(fs_info->endio_meta_write_workers,
+				   &end_io_wq->work);
 		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE)
-			btrfs_queue_worker(&fs_info->endio_freespace_worker,
-					   &end_io_wq->work);
+			queue_work(fs_info->endio_freespace_worker,
+				   &end_io_wq->work);
 		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
-			btrfs_queue_worker(&fs_info->endio_raid56_workers,
-					   &end_io_wq->work);
+			queue_work(fs_info->endio_raid56_workers,
+				   &end_io_wq->work);
 		else
-			btrfs_queue_worker(&fs_info->endio_write_workers,
-					   &end_io_wq->work);
+			queue_work(fs_info->endio_write_workers,
+				   &end_io_wq->work);
 	} else {
 		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
-			btrfs_queue_worker(&fs_info->endio_raid56_workers,
+			queue_work(fs_info->endio_raid56_workers,
 					   &end_io_wq->work);
 		else if (end_io_wq->metadata)
-			btrfs_queue_worker(&fs_info->endio_meta_workers,
+			queue_work(fs_info->endio_meta_workers,
 					   &end_io_wq->work);
 		else
-			btrfs_queue_worker(&fs_info->endio_workers,
+			queue_work(fs_info->endio_workers,
 					   &end_io_wq->work);
 	}
 }
@@ -1662,7 +1661,7 @@ static int setup_bdi(struct btrfs_fs_info *info, struct
backing_dev_info *bdi)
  * called by the kthread helper functions to finally call the bio end_io
  * functions.  This is where read checksum verification actually happens
  */
-static void end_workqueue_fn(struct btrfs_work *work)
+static void end_workqueue_fn(struct work_struct *work)
 {
 	struct bio *bio;
 	struct end_io_wq *end_io_wq;
@@ -1987,22 +1986,22 @@ static noinline int next_root_backup(struct
btrfs_fs_info *info,
 static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info)
 {
 	btrfs_stop_workers(&fs_info->generic_worker);
-	btrfs_stop_workers(&fs_info->fixup_workers);
 	btrfs_stop_workers(&fs_info->delalloc_workers);
 	btrfs_stop_workers(&fs_info->workers);
-	btrfs_stop_workers(&fs_info->endio_workers);
-	btrfs_stop_workers(&fs_info->endio_meta_workers);
-	btrfs_stop_workers(&fs_info->endio_raid56_workers);
-	btrfs_stop_workers(&fs_info->rmw_workers);
-	btrfs_stop_workers(&fs_info->endio_meta_write_workers);
-	btrfs_stop_workers(&fs_info->endio_write_workers);
-	btrfs_stop_workers(&fs_info->endio_freespace_worker);
 	btrfs_stop_workers(&fs_info->submit_workers);
-	btrfs_stop_workers(&fs_info->delayed_workers);
-	btrfs_stop_workers(&fs_info->caching_workers);
-	btrfs_stop_workers(&fs_info->readahead_workers);
-	btrfs_stop_workers(&fs_info->flush_workers);
-	btrfs_stop_workers(&fs_info->qgroup_rescan_workers);
+	destroy_workqueue(fs_info->fixup_workers);
+	destroy_workqueue(fs_info->endio_workers);
+	destroy_workqueue(fs_info->endio_meta_workers);
+	destroy_workqueue(fs_info->endio_raid56_workers);
+	destroy_workqueue(fs_info->rmw_workers);
+	destroy_workqueue(fs_info->endio_meta_write_workers);
+	destroy_workqueue(fs_info->endio_write_workers);
+	destroy_workqueue(fs_info->endio_freespace_worker);
+	destroy_workqueue(fs_info->delayed_workers);
+	destroy_workqueue(fs_info->caching_workers);
+	destroy_workqueue(fs_info->readahead_workers);
+	destroy_workqueue(fs_info->flush_workers);
+	destroy_workqueue(fs_info->qgroup_rescan_workers);
 }
 
 /* helper to cleanup tree roots */
@@ -2099,6 +2098,8 @@ int open_ctree(struct super_block *sb,
 	struct btrfs_root *quota_root;
 	struct btrfs_root *log_tree_root;
 	int ret;
+	int max_active;
+	int flags = WQ_UNBOUND | WQ_MEM_RECLAIM;
 	int err = -EINVAL;
 	int num_backups_tried = 0;
 	int backup_index = 0;
@@ -2457,6 +2458,7 @@ int open_ctree(struct super_block *sb,
 		goto fail_alloc;
 	}
 
+	max_active = fs_info->thread_pool_size;
 	btrfs_init_workers(&fs_info->generic_worker,
 			   "genwork", 1, NULL);
 
@@ -2468,23 +2470,13 @@ int open_ctree(struct super_block *sb,
 			   fs_info->thread_pool_size,
 			   &fs_info->generic_worker);
 
-	btrfs_init_workers(&fs_info->flush_workers, "flush_delalloc",
-			   fs_info->thread_pool_size,
-			   &fs_info->generic_worker);
-
+	fs_info->flush_workers = alloc_workqueue("flush_delalloc", flags,
+						 max_active);
 	btrfs_init_workers(&fs_info->submit_workers, "submit",
 			   min_t(u64, fs_devices->num_devices,
 			   fs_info->thread_pool_size),
 			   &fs_info->generic_worker);
-
-	btrfs_init_workers(&fs_info->caching_workers, "cache",
-			   2, &fs_info->generic_worker);
-
-	/* a higher idle thresh on the submit workers makes it much more
-	 * likely that bios will be send down in a sane order to the
-	 * devices
-	 */
-	fs_info->submit_workers.idle_thresh = 64;
+	fs_info->caching_workers = alloc_workqueue("cache", flags, 2);
 
 	fs_info->workers.idle_thresh = 16;
 	fs_info->workers.ordered = 1;
@@ -2492,72 +2484,42 @@ int open_ctree(struct super_block *sb,
 	fs_info->delalloc_workers.idle_thresh = 2;
 	fs_info->delalloc_workers.ordered = 1;
 
-	btrfs_init_workers(&fs_info->fixup_workers, "fixup", 1,
-			   &fs_info->generic_worker);
-	btrfs_init_workers(&fs_info->endio_workers, "endio",
-			   fs_info->thread_pool_size,
-			   &fs_info->generic_worker);
-	btrfs_init_workers(&fs_info->endio_meta_workers,
"endio-meta",
-			   fs_info->thread_pool_size,
-			   &fs_info->generic_worker);
-	btrfs_init_workers(&fs_info->endio_meta_write_workers,
-			   "endio-meta-write", fs_info->thread_pool_size,
-			   &fs_info->generic_worker);
-	btrfs_init_workers(&fs_info->endio_raid56_workers,
-			   "endio-raid56", fs_info->thread_pool_size,
-			   &fs_info->generic_worker);
-	btrfs_init_workers(&fs_info->rmw_workers,
-			   "rmw", fs_info->thread_pool_size,
-			   &fs_info->generic_worker);
-	btrfs_init_workers(&fs_info->endio_write_workers,
"endio-write",
-			   fs_info->thread_pool_size,
-			   &fs_info->generic_worker);
-	btrfs_init_workers(&fs_info->endio_freespace_worker,
"freespace-write",
-			   1, &fs_info->generic_worker);
-	btrfs_init_workers(&fs_info->delayed_workers, "delayed-meta",
-			   fs_info->thread_pool_size,
-			   &fs_info->generic_worker);
-	btrfs_init_workers(&fs_info->readahead_workers, "readahead",
-			   fs_info->thread_pool_size,
-			   &fs_info->generic_worker);
-	btrfs_init_workers(&fs_info->qgroup_rescan_workers,
"qgroup-rescan", 1,
-			   &fs_info->generic_worker);
-
-	/*
-	 * endios are largely parallel and should have a very
-	 * low idle thresh
-	 */
-	fs_info->endio_workers.idle_thresh = 4;
-	fs_info->endio_meta_workers.idle_thresh = 4;
-	fs_info->endio_raid56_workers.idle_thresh = 4;
-	fs_info->rmw_workers.idle_thresh = 2;
-
-	fs_info->endio_write_workers.idle_thresh = 2;
-	fs_info->endio_meta_write_workers.idle_thresh = 2;
-	fs_info->readahead_workers.idle_thresh = 2;
-
+	fs_info->fixup_workers = alloc_workqueue("fixup", flags, 1);
+	fs_info->endio_workers = alloc_workqueue("endio", flags,
max_active);
+	fs_info->endio_meta_workers = alloc_workqueue("endio-meta",
flags,
+						      max_active);
+	fs_info->endio_meta_write_workers =
alloc_workqueue("endio-meta-write",
+							    flags, max_active);
+	fs_info->endio_raid56_workers = alloc_workqueue("endio-raid56",
flags,
+							max_active);
+	fs_info->rmw_workers = alloc_workqueue("rmw", flags, max_active);
+	fs_info->endio_write_workers = alloc_workqueue("endio-write",
flags,
+						       max_active);
+	fs_info->endio_freespace_worker =
alloc_workqueue("freespace-write",
+							  flags, 1);
+	fs_info->delayed_workers = alloc_workqueue("delayed_meta", flags,
+						   max_active);
+	fs_info->readahead_workers = alloc_workqueue("readahead", flags,
+						     max_active);
+	fs_info->qgroup_rescan_workers = alloc_workqueue("group-rescan",
+							 flags, 1);
 	/*
 	 * btrfs_start_workers can really only fail because of ENOMEM so just
 	 * return -ENOMEM if any of these fail.
 	 */
 	ret = btrfs_start_workers(&fs_info->workers);
 	ret |= btrfs_start_workers(&fs_info->generic_worker);
-	ret |= btrfs_start_workers(&fs_info->submit_workers);
 	ret |= btrfs_start_workers(&fs_info->delalloc_workers);
-	ret |= btrfs_start_workers(&fs_info->fixup_workers);
-	ret |= btrfs_start_workers(&fs_info->endio_workers);
-	ret |= btrfs_start_workers(&fs_info->endio_meta_workers);
-	ret |= btrfs_start_workers(&fs_info->rmw_workers);
-	ret |= btrfs_start_workers(&fs_info->endio_raid56_workers);
-	ret |= btrfs_start_workers(&fs_info->endio_meta_write_workers);
-	ret |= btrfs_start_workers(&fs_info->endio_write_workers);
-	ret |= btrfs_start_workers(&fs_info->endio_freespace_worker);
-	ret |= btrfs_start_workers(&fs_info->delayed_workers);
-	ret |= btrfs_start_workers(&fs_info->caching_workers);
-	ret |= btrfs_start_workers(&fs_info->readahead_workers);
-	ret |= btrfs_start_workers(&fs_info->flush_workers);
-	ret |= btrfs_start_workers(&fs_info->qgroup_rescan_workers);
-	if (ret) {
+	ret |= btrfs_start_workers(&fs_info->submit_workers);
+
+	if (ret || !(fs_info->flush_workers && fs_info->endio_workers
&&
+		     fs_info->endio_meta_workers &&
+		     fs_info->endio_raid56_workers &&
+		     fs_info->rmw_workers && fs_info->qgroup_rescan_workers
&&
+		     fs_info->endio_meta_write_workers &&
+		     fs_info->endio_write_workers &&
+		     fs_info->caching_workers && fs_info->readahead_workers
&&
+		     fs_info->fixup_workers && fs_info->delayed_workers)) {
 		err = -ENOMEM;
 		goto fail_sb_buffer;
 	}
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 0236de7..c8f67d9 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -377,7 +377,7 @@ static u64 add_new_free_space(struct btrfs_block_group_cache
*block_group,
 	return total_added;
 }
 
-static noinline void caching_thread(struct btrfs_work *work)
+static noinline void caching_thread(struct work_struct *work)
 {
 	struct btrfs_block_group_cache *block_group;
 	struct btrfs_fs_info *fs_info;
@@ -530,7 +530,7 @@ static int cache_block_group(struct btrfs_block_group_cache
*cache,
 	caching_ctl->block_group = cache;
 	caching_ctl->progress = cache->key.objectid;
 	atomic_set(&caching_ctl->count, 1);
-	caching_ctl->work.func = caching_thread;
+	INIT_WORK(&caching_ctl->work, caching_thread);
 
 	spin_lock(&cache->lock);
 	/*
@@ -621,7 +621,7 @@ static int cache_block_group(struct btrfs_block_group_cache
*cache,
 
 	btrfs_get_block_group(cache);
 
-	btrfs_queue_worker(&fs_info->caching_workers,
&caching_ctl->work);
+	queue_work(fs_info->caching_workers, &caching_ctl->work);
 
 	return ret;
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b7c2487..53901a5 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1818,10 +1818,10 @@ int btrfs_set_extent_delalloc(struct inode *inode, u64
start, u64 end,
 /* see btrfs_writepage_start_hook for details on why this is required */
 struct btrfs_writepage_fixup {
 	struct page *page;
-	struct btrfs_work work;
+	struct work_struct work;
 };
 
-static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
+static void btrfs_writepage_fixup_worker(struct work_struct *work)
 {
 	struct btrfs_writepage_fixup *fixup;
 	struct btrfs_ordered_extent *ordered;
@@ -1912,9 +1912,9 @@ static int btrfs_writepage_start_hook(struct page *page,
u64 start, u64 end)
 
 	SetPageChecked(page);
 	page_cache_get(page);
-	fixup->work.func = btrfs_writepage_fixup_worker;
+	INIT_WORK(&fixup->work, btrfs_writepage_fixup_worker);
 	fixup->page = page;
-	btrfs_queue_worker(&root->fs_info->fixup_workers,
&fixup->work);
+	queue_work(root->fs_info->fixup_workers, &fixup->work);
 	return -EBUSY;
 }
 
@@ -2780,7 +2780,7 @@ out:
 	return ret;
 }
 
-static void finish_ordered_fn(struct btrfs_work *work)
+static void finish_ordered_fn(struct work_struct *work)
 {
 	struct btrfs_ordered_extent *ordered_extent;
 	ordered_extent = container_of(work, struct btrfs_ordered_extent, work);
@@ -2793,7 +2793,7 @@ static int btrfs_writepage_end_io_hook(struct page *page,
u64 start, u64 end,
 	struct inode *inode = page->mapping->host;
 	struct btrfs_root *root = BTRFS_I(inode)->root;
 	struct btrfs_ordered_extent *ordered_extent = NULL;
-	struct btrfs_workers *workers;
+	struct workqueue_struct *workers;
 
 	trace_btrfs_writepage_end_io_hook(page, start, end, uptodate);
 
@@ -2802,14 +2802,13 @@ static int btrfs_writepage_end_io_hook(struct page
*page, u64 start, u64 end,
 					    end - start + 1, uptodate))
 		return 0;
 
-	ordered_extent->work.func = finish_ordered_fn;
-	ordered_extent->work.flags = 0;
+	INIT_WORK(&ordered_extent->work, finish_ordered_fn);
 
 	if (btrfs_is_free_space_inode(inode))
-		workers = &root->fs_info->endio_freespace_worker;
+		workers = root->fs_info->endio_freespace_worker;
 	else
-		workers = &root->fs_info->endio_write_workers;
-	btrfs_queue_worker(workers, &ordered_extent->work);
+		workers = root->fs_info->endio_write_workers;
+	queue_work(workers, &ordered_extent->work);
 
 	return 0;
 }
@@ -6906,10 +6905,9 @@ again:
 	if (!ret)
 		goto out_test;
 
-	ordered->work.func = finish_ordered_fn;
-	ordered->work.flags = 0;
-	btrfs_queue_worker(&root->fs_info->endio_write_workers,
-			   &ordered->work);
+	INIT_WORK(&ordered->work, finish_ordered_fn);
+	queue_work(root->fs_info->endio_write_workers, &ordered->work);
+
 out_test:
 	/*
 	 * our bio might span multiple ordered extents.  If we haven''t
@@ -8187,7 +8185,7 @@ out_notrans:
 	return ret;
 }
 
-static void btrfs_run_delalloc_work(struct btrfs_work *work)
+static void btrfs_run_delalloc_work(struct work_struct *work)
 {
 	struct btrfs_delalloc_work *delalloc_work;
 
@@ -8206,7 +8204,7 @@ static void btrfs_run_delalloc_work(struct btrfs_work
*work)
 }
 
 struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode,
-						    int wait, int delay_iput)
+						      int wait, int delay_iput)
 {
 	struct btrfs_delalloc_work *work;
 
@@ -8219,8 +8217,7 @@ struct btrfs_delalloc_work
*btrfs_alloc_delalloc_work(struct inode *inode,
 	work->inode = inode;
 	work->wait = wait;
 	work->delay_iput = delay_iput;
-	work->work.func = btrfs_run_delalloc_work;
-
+	INIT_WORK(&work->work, btrfs_run_delalloc_work);
 	return work;
 }
 
@@ -8267,8 +8264,7 @@ static int __start_delalloc_inodes(struct btrfs_root
*root, int delay_iput)
 			goto out;
 		}
 		list_add_tail(&work->list, &works);
-		btrfs_queue_worker(&root->fs_info->flush_workers,
-				   &work->work);
+		queue_work(root->fs_info->flush_workers, &work->work);
 
 		cond_resched();
 		spin_lock(&root->delalloc_lock);
diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
index 8136982..9b5ccac 100644
--- a/fs/btrfs/ordered-data.c
+++ b/fs/btrfs/ordered-data.c
@@ -552,7 +552,7 @@ void btrfs_remove_ordered_extent(struct inode *inode,
 	wake_up(&entry->wait);
 }
 
-static void btrfs_run_ordered_extent_work(struct btrfs_work *work)
+static void btrfs_run_ordered_extent_work(struct work_struct *work)
 {
 	struct btrfs_ordered_extent *ordered;
 
@@ -594,10 +594,9 @@ void btrfs_wait_ordered_extents(struct btrfs_root *root,
int delay_iput)
 		atomic_inc(&ordered->refs);
 		spin_unlock(&root->ordered_extent_lock);
 
-		ordered->flush_work.func = btrfs_run_ordered_extent_work;
+		INIT_WORK(&ordered->flush_work, btrfs_run_ordered_extent_work);
 		list_add_tail(&ordered->work_list, &works);
-		btrfs_queue_worker(&root->fs_info->flush_workers,
-				   &ordered->flush_work);
+		queue_work(root->fs_info->flush_workers, &ordered->flush_work);
 
 		cond_resched();
 		spin_lock(&root->ordered_extent_lock);
@@ -706,8 +705,8 @@ int btrfs_run_ordered_operations(struct btrfs_trans_handle
*trans,
 			goto out;
 		}
 		list_add_tail(&work->list, &works);
-		btrfs_queue_worker(&root->fs_info->flush_workers,
-				   &work->work);
+		queue_work(root->fs_info->flush_workers,
+			   &work->work);
 
 		cond_resched();
 		spin_lock(&root->fs_info->ordered_root_lock);
diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
index 68844d5..f4c81d7 100644
--- a/fs/btrfs/ordered-data.h
+++ b/fs/btrfs/ordered-data.h
@@ -123,10 +123,10 @@ struct btrfs_ordered_extent {
 	/* a per root list of all the pending ordered extents */
 	struct list_head root_extent_list;
 
-	struct btrfs_work work;
+	struct work_struct work;
 
 	struct completion completion;
-	struct btrfs_work flush_work;
+	struct work_struct flush_work;
 	struct list_head work_list;
 };
 
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index 1280eff..a49fdfe 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1528,8 +1528,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle *trans,
 		ret = qgroup_rescan_init(fs_info, 0, 1);
 		if (!ret) {
 			qgroup_rescan_zero_tracking(fs_info);
-			btrfs_queue_worker(&fs_info->qgroup_rescan_workers,
-					   &fs_info->qgroup_rescan_work);
+			queue_work(fs_info->qgroup_rescan_workers,
+				   &fs_info->qgroup_rescan_work);
 		}
 		ret = 0;
 	}
@@ -1994,7 +1994,7 @@ out:
 	return ret;
 }
 
-static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
+static void btrfs_qgroup_rescan_worker(struct work_struct *work)
 {
 	struct btrfs_fs_info *fs_info = container_of(work, struct btrfs_fs_info,
 						     qgroup_rescan_work);
@@ -2105,7 +2105,7 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64
progress_objectid,
 
 	memset(&fs_info->qgroup_rescan_work, 0,
 	       sizeof(fs_info->qgroup_rescan_work));
-	fs_info->qgroup_rescan_work.func = btrfs_qgroup_rescan_worker;
+	INIT_WORK(&fs_info->qgroup_rescan_work, btrfs_qgroup_rescan_worker);
 
 	if (ret) {
 err:
@@ -2168,8 +2168,8 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
 
 	qgroup_rescan_zero_tracking(fs_info);
 
-	btrfs_queue_worker(&fs_info->qgroup_rescan_workers,
-			   &fs_info->qgroup_rescan_work);
+	queue_work(fs_info->qgroup_rescan_workers,
+		   &fs_info->qgroup_rescan_work);
 
 	return 0;
 }
@@ -2200,6 +2200,6 @@ void
 btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
 {
 	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
-		btrfs_queue_worker(&fs_info->qgroup_rescan_workers,
-				   &fs_info->qgroup_rescan_work);
+		queue_work(fs_info->qgroup_rescan_workers,
+			   &fs_info->qgroup_rescan_work);
 }
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 0525e13..4b7769d 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -88,7 +88,7 @@ struct btrfs_raid_bio {
 	/*
 	 * for scheduling work in the helper threads
 	 */
-	struct btrfs_work work;
+	struct work_struct work;
 
 	/*
 	 * bio list and bio_list_lock are used
@@ -167,8 +167,8 @@ struct btrfs_raid_bio {
 
 static int __raid56_parity_recover(struct btrfs_raid_bio *rbio);
 static noinline void finish_rmw(struct btrfs_raid_bio *rbio);
-static void rmw_work(struct btrfs_work *work);
-static void read_rebuild_work(struct btrfs_work *work);
+static void rmw_work(struct work_struct *work);
+static void read_rebuild_work(struct work_struct *work);
 static void async_rmw_stripe(struct btrfs_raid_bio *rbio);
 static void async_read_rebuild(struct btrfs_raid_bio *rbio);
 static int fail_bio_stripe(struct btrfs_raid_bio *rbio, struct bio *bio);
@@ -1417,20 +1417,16 @@ cleanup:
 
 static void async_rmw_stripe(struct btrfs_raid_bio *rbio)
 {
-	rbio->work.flags = 0;
-	rbio->work.func = rmw_work;
-
-	btrfs_queue_worker(&rbio->fs_info->rmw_workers,
-			   &rbio->work);
+	INIT_WORK(&rbio->work, rmw_work);
+	queue_work(rbio->fs_info->rmw_workers,
+		   &rbio->work);
 }
 
 static void async_read_rebuild(struct btrfs_raid_bio *rbio)
 {
-	rbio->work.flags = 0;
-	rbio->work.func = read_rebuild_work;
-
-	btrfs_queue_worker(&rbio->fs_info->rmw_workers,
-			   &rbio->work);
+	INIT_WORK(&rbio->work, read_rebuild_work);
+	queue_work(rbio->fs_info->rmw_workers,
+		   &rbio->work);
 }
 
 /*
@@ -1589,7 +1585,7 @@ struct btrfs_plug_cb {
 	struct blk_plug_cb cb;
 	struct btrfs_fs_info *info;
 	struct list_head rbio_list;
-	struct btrfs_work work;
+	struct work_struct work;
 };
 
 /*
@@ -1653,7 +1649,7 @@ static void run_plug(struct btrfs_plug_cb *plug)
  * if the unplug comes from schedule, we have to push the
  * work off to a helper thread
  */
-static void unplug_work(struct btrfs_work *work)
+static void unplug_work(struct work_struct *work)
 {
 	struct btrfs_plug_cb *plug;
 	plug = container_of(work, struct btrfs_plug_cb, work);
@@ -1666,10 +1662,9 @@ static void btrfs_raid_unplug(struct blk_plug_cb *cb,
bool from_schedule)
 	plug = container_of(cb, struct btrfs_plug_cb, cb);
 
 	if (from_schedule) {
-		plug->work.flags = 0;
-		plug->work.func = unplug_work;
-		btrfs_queue_worker(&plug->info->rmw_workers,
-				   &plug->work);
+		INIT_WORK(&plug->work, unplug_work);
+		queue_work(plug->info->rmw_workers,
+			   &plug->work);
 		return;
 	}
 	run_plug(plug);
@@ -2083,7 +2078,7 @@ int raid56_parity_recover(struct btrfs_root *root, struct
bio *bio,
 
 }
 
-static void rmw_work(struct btrfs_work *work)
+static void rmw_work(struct work_struct *work)
 {
 	struct btrfs_raid_bio *rbio;
 
@@ -2091,7 +2086,7 @@ static void rmw_work(struct btrfs_work *work)
 	raid56_rmw_stripe(rbio);
 }
 
-static void read_rebuild_work(struct btrfs_work *work)
+static void read_rebuild_work(struct work_struct *work)
 {
 	struct btrfs_raid_bio *rbio;
 
diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index 1031b69..9607648 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -91,7 +91,7 @@ struct reada_zone {
 };
 
 struct reada_machine_work {
-	struct btrfs_work	work;
+	struct work_struct	work;
 	struct btrfs_fs_info	*fs_info;
 };
 
@@ -732,7 +732,7 @@ static int reada_start_machine_dev(struct btrfs_fs_info
*fs_info,
 
 }
 
-static void reada_start_machine_worker(struct btrfs_work *work)
+static void reada_start_machine_worker(struct work_struct *work)
 {
 	struct reada_machine_work *rmw;
 	struct btrfs_fs_info *fs_info;
@@ -792,10 +792,10 @@ static void reada_start_machine(struct btrfs_fs_info
*fs_info)
 		/* FIXME we cannot handle this properly right now */
 		BUG();
 	}
-	rmw->work.func = reada_start_machine_worker;
+	INIT_WORK(&rmw->work, reada_start_machine_worker);
 	rmw->fs_info = fs_info;
 
-	btrfs_queue_worker(&fs_info->readahead_workers, &rmw->work);
+	queue_work(fs_info->readahead_workers, &rmw->work);
 }
 
 #ifdef DEBUG
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 4ba2a69..025bb53 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -96,7 +96,7 @@ struct scrub_bio {
 #endif
 	int			page_count;
 	int			next_free;
-	struct btrfs_work	work;
+	struct work_struct	work;
 };
 
 struct scrub_block {
@@ -154,7 +154,7 @@ struct scrub_fixup_nodatasum {
 	struct btrfs_device	*dev;
 	u64			logical;
 	struct btrfs_root	*root;
-	struct btrfs_work	work;
+	struct work_struct	work;
 	int			mirror_num;
 };
 
@@ -164,7 +164,7 @@ struct scrub_copy_nocow_ctx {
 	u64			len;
 	int			mirror_num;
 	u64			physical_for_dev_replace;
-	struct btrfs_work	work;
+	struct work_struct	work;
 };
 
 struct scrub_warning {
@@ -224,7 +224,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64 logical,
u64 len,
 		       u64 gen, int mirror_num, u8 *csum, int force,
 		       u64 physical_for_dev_replace);
 static void scrub_bio_end_io(struct bio *bio, int err);
-static void scrub_bio_end_io_worker(struct btrfs_work *work);
+static void scrub_bio_end_io_worker(struct work_struct *work);
 static void scrub_block_complete(struct scrub_block *sblock);
 static void scrub_remap_extent(struct btrfs_fs_info *fs_info,
 			       u64 extent_logical, u64 extent_len,
@@ -241,14 +241,14 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx
*sctx,
 				    struct scrub_page *spage);
 static void scrub_wr_submit(struct scrub_ctx *sctx);
 static void scrub_wr_bio_end_io(struct bio *bio, int err);
-static void scrub_wr_bio_end_io_worker(struct btrfs_work *work);
+static void scrub_wr_bio_end_io_worker(struct work_struct *work);
 static int write_page_nocow(struct scrub_ctx *sctx,
 			    u64 physical_for_dev_replace, struct page *page);
 static int copy_nocow_pages_for_inode(u64 inum, u64 offset, u64 root,
 				      void *ctx);
 static int copy_nocow_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
 			    int mirror_num, u64 physical_for_dev_replace);
-static void copy_nocow_pages_worker(struct btrfs_work *work);
+static void copy_nocow_pages_worker(struct work_struct *work);
 
 
 static void scrub_pending_bio_inc(struct scrub_ctx *sctx)
@@ -386,7 +386,7 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device *dev,
int is_dev_replace)
 		sbio->index = i;
 		sbio->sctx = sctx;
 		sbio->page_count = 0;
-		sbio->work.func = scrub_bio_end_io_worker;
+		INIT_WORK(&sbio->work, scrub_bio_end_io_worker);
 
 		if (i != SCRUB_BIOS_PER_SCTX - 1)
 			sctx->bios[i]->next_free = i + 1;
@@ -691,7 +691,7 @@ out:
 	return -EIO;
 }
 
-static void scrub_fixup_nodatasum(struct btrfs_work *work)
+static void scrub_fixup_nodatasum(struct work_struct *work)
 {
 	int ret;
 	struct scrub_fixup_nodatasum *fixup;
@@ -956,9 +956,8 @@ nodatasum_case:
 		fixup_nodatasum->root = fs_info->extent_root;
 		fixup_nodatasum->mirror_num = failed_mirror_index + 1;
 		scrub_pending_trans_workers_inc(sctx);
-		fixup_nodatasum->work.func = scrub_fixup_nodatasum;
-		btrfs_queue_worker(&fs_info->scrub_workers,
-				   &fixup_nodatasum->work);
+		INIT_WORK(&fixup_nodatasum->work, scrub_fixup_nodatasum);
+		queue_work(fs_info->scrub_workers, &fixup_nodatasum->work);
 		goto out;
 	}
 
@@ -1592,11 +1591,11 @@ static void scrub_wr_bio_end_io(struct bio *bio, int
err)
 	sbio->err = err;
 	sbio->bio = bio;
 
-	sbio->work.func = scrub_wr_bio_end_io_worker;
-	btrfs_queue_worker(&fs_info->scrub_wr_completion_workers,
&sbio->work);
+	INIT_WORK(&sbio->work, scrub_wr_bio_end_io_worker);
+	queue_work(fs_info->scrub_wr_completion_workers, &sbio->work);
 }
 
-static void scrub_wr_bio_end_io_worker(struct btrfs_work *work)
+static void scrub_wr_bio_end_io_worker(struct work_struct *work)
 {
 	struct scrub_bio *sbio = container_of(work, struct scrub_bio, work);
 	struct scrub_ctx *sctx = sbio->sctx;
@@ -2061,10 +2060,10 @@ static void scrub_bio_end_io(struct bio *bio, int err)
 	sbio->err = err;
 	sbio->bio = bio;
 
-	btrfs_queue_worker(&fs_info->scrub_workers, &sbio->work);
+	queue_work(fs_info->scrub_workers, &sbio->work);
 }
 
-static void scrub_bio_end_io_worker(struct btrfs_work *work)
+static void scrub_bio_end_io_worker(struct work_struct *work)
 {
 	struct scrub_bio *sbio = container_of(work, struct scrub_bio, work);
 	struct scrub_ctx *sctx = sbio->sctx;
@@ -2778,34 +2777,33 @@ static noinline_for_stack int scrub_workers_get(struct
btrfs_fs_info *fs_info,
 						int is_dev_replace)
 {
 	int ret = 0;
+	int flags = WQ_UNBOUND | WQ_MEM_RECLAIM;
+	int max_active = fs_info->thread_pool_size;
 
 	mutex_lock(&fs_info->scrub_lock);
 	if (fs_info->scrub_workers_refcnt == 0) {
 		if (is_dev_replace)
-			btrfs_init_workers(&fs_info->scrub_workers, "scrub", 1,
-					&fs_info->generic_worker);
+			fs_info->scrub_workers +				alloc_workqueue("scrub", flags, 1);
 		else
-			btrfs_init_workers(&fs_info->scrub_workers, "scrub",
-					fs_info->thread_pool_size,
-					&fs_info->generic_worker);
-		fs_info->scrub_workers.idle_thresh = 4;
-		ret = btrfs_start_workers(&fs_info->scrub_workers);
-		if (ret)
+			fs_info->scrub_workers +				alloc_workqueue("scrub", flags,
max_active);
+		if (!fs_info->scrub_workers) {
+			ret = -ENOMEM;
 			goto out;
-		btrfs_init_workers(&fs_info->scrub_wr_completion_workers,
-				   "scrubwrc",
-				   fs_info->thread_pool_size,
-				   &fs_info->generic_worker);
-		fs_info->scrub_wr_completion_workers.idle_thresh = 2;
-		ret = btrfs_start_workers(
-				&fs_info->scrub_wr_completion_workers);
-		if (ret)
+		}
+		fs_info->scrub_wr_completion_workers +		
alloc_workqueue("scrubwrc", flags, max_active);
+		if (!fs_info->scrub_wr_completion_workers) {
+			ret = -ENOMEM;
 			goto out;
-		btrfs_init_workers(&fs_info->scrub_nocow_workers, "scrubnc",
1,
-				   &fs_info->generic_worker);
-		ret = btrfs_start_workers(&fs_info->scrub_nocow_workers);
-		if (ret)
+		}
+		fs_info->scrub_nocow_workers +			alloc_workqueue("scrubnc",
flags, 1);
+		if (!fs_info->scrub_nocow_workers) {
+			ret = -ENOMEM;
 			goto out;
+		}
 	}
 	++fs_info->scrub_workers_refcnt;
 out:
@@ -2818,9 +2816,9 @@ static noinline_for_stack void scrub_workers_put(struct
btrfs_fs_info *fs_info)
 {
 	mutex_lock(&fs_info->scrub_lock);
 	if (--fs_info->scrub_workers_refcnt == 0) {
-		btrfs_stop_workers(&fs_info->scrub_workers);
-		btrfs_stop_workers(&fs_info->scrub_wr_completion_workers);
-		btrfs_stop_workers(&fs_info->scrub_nocow_workers);
+		destroy_workqueue(fs_info->scrub_workers);
+		destroy_workqueue(fs_info->scrub_wr_completion_workers);
+		destroy_workqueue(fs_info->scrub_nocow_workers);
 	}
 	WARN_ON(fs_info->scrub_workers_refcnt < 0);
 	mutex_unlock(&fs_info->scrub_lock);
@@ -3130,14 +3128,14 @@ static int copy_nocow_pages(struct scrub_ctx *sctx, u64
logical, u64 len,
 	nocow_ctx->len = len;
 	nocow_ctx->mirror_num = mirror_num;
 	nocow_ctx->physical_for_dev_replace = physical_for_dev_replace;
-	nocow_ctx->work.func = copy_nocow_pages_worker;
-	btrfs_queue_worker(&fs_info->scrub_nocow_workers,
-			   &nocow_ctx->work);
+	INIT_WORK(&nocow_ctx->work, copy_nocow_pages_worker);
+	queue_work(fs_info->scrub_nocow_workers,
+		   &nocow_ctx->work);
 
 	return 0;
 }
 
-static void copy_nocow_pages_worker(struct btrfs_work *work)
+static void copy_nocow_pages_worker(struct work_struct *work)
 {
 	struct scrub_copy_nocow_ctx *nocow_ctx  		container_of(work, struct
scrub_copy_nocow_ctx, work);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 8eb6191..f557ab6 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1177,16 +1177,19 @@ static void btrfs_resize_thread_pool(struct
btrfs_fs_info *fs_info,
 	btrfs_set_max_workers(&fs_info->workers, new_pool_size);
 	btrfs_set_max_workers(&fs_info->delalloc_workers, new_pool_size);
 	btrfs_set_max_workers(&fs_info->submit_workers, new_pool_size);
-	btrfs_set_max_workers(&fs_info->caching_workers, new_pool_size);
-	btrfs_set_max_workers(&fs_info->fixup_workers, new_pool_size);
-	btrfs_set_max_workers(&fs_info->endio_workers, new_pool_size);
-	btrfs_set_max_workers(&fs_info->endio_meta_workers, new_pool_size);
-	btrfs_set_max_workers(&fs_info->endio_meta_write_workers,
new_pool_size);
-	btrfs_set_max_workers(&fs_info->endio_write_workers, new_pool_size);
-	btrfs_set_max_workers(&fs_info->endio_freespace_worker, new_pool_size);
-	btrfs_set_max_workers(&fs_info->delayed_workers, new_pool_size);
-	btrfs_set_max_workers(&fs_info->readahead_workers, new_pool_size);
-	btrfs_set_max_workers(&fs_info->scrub_wr_completion_workers,
+	workqueue_set_max_active(fs_info->caching_workers, new_pool_size);
+	workqueue_set_max_active(fs_info->fixup_workers, new_pool_size);
+	workqueue_set_max_active(fs_info->endio_workers, new_pool_size);
+	workqueue_set_max_active(fs_info->endio_meta_workers, new_pool_size);
+	workqueue_set_max_active(fs_info->endio_meta_write_workers,
+				 new_pool_size);
+	workqueue_set_max_active(fs_info->endio_write_workers,
+				 new_pool_size);
+	workqueue_set_max_active(fs_info->endio_freespace_worker,
+				 new_pool_size);
+	workqueue_set_max_active(fs_info->delayed_workers, new_pool_size);
+	workqueue_set_max_active(fs_info->readahead_workers, new_pool_size);
+	workqueue_set_max_active(fs_info->scrub_wr_completion_workers,
 			      new_pool_size);
 }
 
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo

2013-Sep-12 08:08 UTC

head link

[PATCH v2 3/9] btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue

Use kernel workqueue to implement a new btrfs_workqueue_struct, which
has the ordering execution feature like the btrfs_worker.

The func is executed in a concurrency way, and the
ordred_func/ordered_free is executed in the sequence them are queued
after the corresponding func is done.
The new btrfs_workqueue use 2 workqueues to implement the original
btrfs_worker, one for the normal work and one for ordered work.

At this patch, high priority work queue is not added yet.
The high priority feature will be added in the following patches.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/Makefile |   3 +-
 fs/btrfs/bwq.c    | 109 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/bwq.h    |  59 +++++++++++++++++++++++++++++
 3 files changed, 170 insertions(+), 1 deletion(-)
 create mode 100644 fs/btrfs/bwq.c
 create mode 100644 fs/btrfs/bwq.h

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index 3932224..d7439df 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -8,7 +8,8 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o
root-tree.o dir-item.o \
 	   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
 	   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
 	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
-	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o
+	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
+	   bwq.o
 
 btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o
 btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o
diff --git a/fs/btrfs/bwq.c b/fs/btrfs/bwq.c
new file mode 100644
index 0000000..feccf21
--- /dev/null
+++ b/fs/btrfs/bwq.c
@@ -0,0 +1,109 @@
+/*
+ * Copyright (C) 2013 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#include <linux/slab.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/freezer.h>
+#include <linux/workqueue.h>
+#include <linux/completion.h>
+#include <linux/spinlock.h>
+#include "bwq.h"
+
+struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name,
+						     char *ordered_name,
+						     int max_active)
+{
+	int wq_flags = WQ_UNBOUND | WQ_MEM_RECLAIM;
+	struct btrfs_workqueue_struct *ret = kzalloc(sizeof(*ret), GFP_NOFS);
+	if (unlikely(!ret))
+		return NULL;
+	ret->normal_wq = alloc_workqueue(name, wq_flags, max_active);
+	if (unlikely(!ret->normal_wq)) {
+		kfree(ret);
+		return NULL;
+	}
+
+	ret->ordered_wq = alloc_ordered_workqueue(ordered_name,
+						  WQ_MEM_RECLAIM);
+	if (unlikely(!ret->ordered_wq)) {
+		destroy_workqueue(ret->normal_wq);
+		kfree(ret);
+		return NULL;
+	}
+
+	spin_lock_init(&ret->insert_lock);
+	return ret;
+}
+
+/*
+ * When in out-of-order mode(SSD), high concurrency is OK, so no need
+ * to do the completion things, just call the ordered_func after the
+ * normal work is done
+ */
+
+static void normal_work_helper(struct work_struct *arg)
+{
+	struct btrfs_work_struct *work;
+	work = container_of(arg, struct btrfs_work_struct, normal_work);
+	work->func(work);
+	complete(&work->normal_completion);
+}
+
+static void ordered_work_helper(struct work_struct *arg)
+{
+	struct btrfs_work_struct *work;
+	work = container_of(arg, struct btrfs_work_struct, ordered_work);
+	wait_for_completion(&work->normal_completion);
+	work->ordered_func(work);
+	if (work->ordered_free)
+		work->ordered_free(work);
+}
+
+void btrfs_init_work(struct btrfs_work_struct *work,
+			    void (*func)(struct btrfs_work_struct *),
+			    void (*ordered_func)(struct btrfs_work_struct *),
+			    void (*ordered_free)(struct btrfs_work_struct *))
+{
+	work->func = func;
+	work->ordered_func = ordered_func;
+	work->ordered_free = ordered_free;
+	init_completion(&work->normal_completion);
+}
+
+void btrfs_queue_work(struct btrfs_workqueue_struct *wq,
+		      struct btrfs_work_struct *work)
+{
+	INIT_WORK(&work->normal_work, normal_work_helper);
+	INIT_WORK(&work->ordered_work, ordered_work_helper);
+	spin_lock(&wq->insert_lock);
+	queue_work(wq->normal_wq, &work->normal_work);
+	queue_work(wq->ordered_wq, &work->ordered_work);
+	spin_unlock(&wq->insert_lock);
+}
+
+void btrfs_destroy_workqueue(struct btrfs_workqueue_struct *wq)
+{
+	destroy_workqueue(wq->ordered_wq);
+	destroy_workqueue(wq->normal_wq);
+}
+
+void btrfs_workqueue_set_max(struct btrfs_workqueue_struct *wq, int max)
+{
+	workqueue_set_max_active(wq->normal_wq, max);
+}
diff --git a/fs/btrfs/bwq.h b/fs/btrfs/bwq.h
new file mode 100644
index 0000000..bf12c90
--- /dev/null
+++ b/fs/btrfs/bwq.h
@@ -0,0 +1,59 @@
+/*
+ * Copyright (C) 2013 Fujitsu.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public
+ * License v2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; if not, write to the
+ * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+ * Boston, MA 021110-1307, USA.
+ */
+
+#ifndef __BTRFS_WORK_QUEUE_
+#define __BTRFS_WORK_QUEUE_
+
+struct btrfs_workqueue_struct {
+	struct workqueue_struct *normal_wq;
+	struct workqueue_struct *ordered_wq;
+
+	/*
+	 * Spinlock to ensure that both ordered and normal work can
+	 * be inserted to each workqueue at the same sequance,
+	 * which will reduce the ordered_work waiting time and disk head moves.
+	 *
+	 * For HDD, without the lock seqence read/write performance
+	 * will regress about 40% due to the extra waiting and seeking.
+	 */
+	spinlock_t insert_lock;
+};
+
+struct btrfs_work_struct {
+	void (*func)(struct btrfs_work_struct *arg);
+	void (*ordered_func)(struct btrfs_work_struct *arg);
+	void (*ordered_free)(struct btrfs_work_struct *arg);
+
+	/* Don''t touch things below */
+	struct work_struct normal_work;
+	struct work_struct ordered_work;
+	struct completion normal_completion;
+};
+
+struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name,
+						     char *ordered_name,
+						     int max_active);
+void btrfs_init_work(struct btrfs_work_struct *work,
+		     void (*func)(struct btrfs_work_struct *),
+		     void (*ordered_func)(struct btrfs_work_struct *),
+		     void (*ordered_free)(struct btrfs_work_struct *));
+void btrfs_queue_work(struct btrfs_workqueue_struct *wq,
+		      struct btrfs_work_struct *work);
+void btrfs_destroy_workqueue(struct btrfs_workqueue_struct *wq);
+void btrfs_workqueue_set_max(struct btrfs_workqueue_struct *wq, int max);
+#endif
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo

2013-Sep-12 08:08 UTC

head link

[PATCH v2 4/9] btrfs: Add high priority workqueue support for btrfs_workqueue_struct

Add high priority workqueue, which added a new workqueue to
btrfs_workqueue_struct.

Whether using the high priority workqueue must be decided at
initialization.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/bwq.c | 29 ++++++++++++++++++++++++++++-
 fs/btrfs/bwq.h |  8 ++++++++
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/bwq.c b/fs/btrfs/bwq.c
index feccf21..c2a089c 100644
--- a/fs/btrfs/bwq.c
+++ b/fs/btrfs/bwq.c
@@ -27,6 +27,7 @@
 
 struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name,
 						     char *ordered_name,
+						     char *high_name,
 						     int max_active)
 {
 	int wq_flags = WQ_UNBOUND | WQ_MEM_RECLAIM;
@@ -46,6 +47,17 @@ struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char
*name,
 		kfree(ret);
 		return NULL;
 	}
+	if (high_name) {
+		ret->high_wq = alloc_workqueue(high_name, wq_flags | WQ_HIGHPRI,
+					       max_active);
+		if (unlikely(!ret->high_wq)) {
+			destroy_workqueue(ret->normal_wq);
+			destroy_workqueue(ret->ordered_wq);
+			kfree(ret);
+			return NULL;
+		}
+	}
+
 
 	spin_lock_init(&ret->insert_lock);
 	return ret;
@@ -89,10 +101,16 @@ void btrfs_init_work(struct btrfs_work_struct *work,
 void btrfs_queue_work(struct btrfs_workqueue_struct *wq,
 		      struct btrfs_work_struct *work)
 {
+	struct workqueue_struct *dest_wq;
+	if (work->high && wq->high_wq)
+		dest_wq = wq->high_wq;
+	else
+		dest_wq = wq->normal_wq;
+
 	INIT_WORK(&work->normal_work, normal_work_helper);
 	INIT_WORK(&work->ordered_work, ordered_work_helper);
 	spin_lock(&wq->insert_lock);
-	queue_work(wq->normal_wq, &work->normal_work);
+	queue_work(dest_wq, &work->normal_work);
 	queue_work(wq->ordered_wq, &work->ordered_work);
 	spin_unlock(&wq->insert_lock);
 }
@@ -100,10 +118,19 @@ void btrfs_queue_work(struct btrfs_workqueue_struct *wq,
 void btrfs_destroy_workqueue(struct btrfs_workqueue_struct *wq)
 {
 	destroy_workqueue(wq->ordered_wq);
+	if (wq->high_wq)
+		destroy_workqueue(wq->high_wq);
 	destroy_workqueue(wq->normal_wq);
 }
 
 void btrfs_workqueue_set_max(struct btrfs_workqueue_struct *wq, int max)
 {
 	workqueue_set_max_active(wq->normal_wq, max);
+	if (wq->high_wq)
+		workqueue_set_max_active(wq->high_wq, max);
+}
+
+void btrfs_set_work_high_priority(struct btrfs_work_struct *work)
+{
+	work->high = 1;
 }
diff --git a/fs/btrfs/bwq.h b/fs/btrfs/bwq.h
index bf12c90..d9a7ded 100644
--- a/fs/btrfs/bwq.h
+++ b/fs/btrfs/bwq.h
@@ -22,6 +22,7 @@
 struct btrfs_workqueue_struct {
 	struct workqueue_struct *normal_wq;
 	struct workqueue_struct *ordered_wq;
+	struct workqueue_struct *high_wq;
 
 	/*
 	 * Spinlock to ensure that both ordered and normal work can
@@ -43,10 +44,16 @@ struct btrfs_work_struct {
 	struct work_struct normal_work;
 	struct work_struct ordered_work;
 	struct completion normal_completion;
+	int high;
 };
 
+/*
+ * name and ordered_name is mandamental, if high_name not given(NULL),
+ * high priority workqueue feature will not be available
+ */
 struct btrfs_workqueue_struct *btrfs_alloc_workqueue(char *name,
 						     char *ordered_name,
+						     char *high_name,
 						     int max_active);
 void btrfs_init_work(struct btrfs_work_struct *work,
 		     void (*func)(struct btrfs_work_struct *),
@@ -56,4 +63,5 @@ void btrfs_queue_work(struct btrfs_workqueue_struct *wq,
 		      struct btrfs_work_struct *work);
 void btrfs_destroy_workqueue(struct btrfs_workqueue_struct *wq);
 void btrfs_workqueue_set_max(struct btrfs_workqueue_struct *wq, int max);
+void btrfs_set_work_high_priority(struct btrfs_work_struct *work);
 #endif
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo

2013-Sep-12 08:08 UTC

head link

[PATCH v2 5/9] btrfs: Use btrfs_workqueue_struct to replace the fs_info->workers

Use the newly created btrfs_workqueue_struct to replace the original
fs_info->workers

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/disk-io.c | 36 +++++++++++++++---------------------
 fs/btrfs/super.c   |  3 ++-
 3 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0dd6ec9..2662ef2 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1477,10 +1477,10 @@ struct btrfs_fs_info {
 	 * two
 	 */
 	struct btrfs_workers generic_worker;
-	struct btrfs_workers workers;
 	struct btrfs_workers delalloc_workers;
 	struct btrfs_workers submit_workers;
 
+	struct btrfs_workqueue_struct *workers;
 	struct workqueue_struct *flush_workers;
 	struct workqueue_struct *endio_workers;
 	struct workqueue_struct *endio_meta_workers;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index d02a552..364c409 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -48,6 +48,7 @@
 #include "rcu-string.h"
 #include "dev-replace.h"
 #include "raid56.h"
+#include "bwq.h"
 
 #ifdef CONFIG_X86
 #include <asm/cpufeature.h>
@@ -108,7 +109,7 @@ struct async_submit_bio {
 	 * can''t tell us where in the file the bio should go
 	 */
 	u64 bio_offset;
-	struct btrfs_work work;
+	struct btrfs_work_struct work;
 	int error;
 };
 
@@ -751,12 +752,12 @@ int btrfs_bio_wq_end_io(struct btrfs_fs_info *info, struct
bio *bio,
 unsigned long btrfs_async_submit_limit(struct btrfs_fs_info *info)
 {
 	unsigned long limit = min_t(unsigned long,
-				    info->workers.max_workers,
+				    info->thread_pool_size,
 				    info->fs_devices->open_devices);
 	return 256 * limit;
 }
 
-static void run_one_async_start(struct btrfs_work *work)
+static void run_one_async_start(struct btrfs_work_struct *work)
 {
 	struct async_submit_bio *async;
 	int ret;
@@ -769,7 +770,7 @@ static void run_one_async_start(struct btrfs_work *work)
 		async->error = ret;
 }
 
-static void run_one_async_done(struct btrfs_work *work)
+static void run_one_async_done(struct btrfs_work_struct *work)
 {
 	struct btrfs_fs_info *fs_info;
 	struct async_submit_bio *async;
@@ -796,7 +797,7 @@ static void run_one_async_done(struct btrfs_work *work)
 			       async->bio_offset);
 }
 
-static void run_one_async_free(struct btrfs_work *work)
+static void run_one_async_free(struct btrfs_work_struct *work)
 {
 	struct async_submit_bio *async;
 
@@ -824,11 +825,9 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info,
struct inode *inode,
 	async->submit_bio_start = submit_bio_start;
 	async->submit_bio_done = submit_bio_done;
 
-	async->work.func = run_one_async_start;
-	async->work.ordered_func = run_one_async_done;
-	async->work.ordered_free = run_one_async_free;
+	btrfs_init_work(&async->work, run_one_async_start,
+			run_one_async_done, run_one_async_free);
 
-	async->work.flags = 0;
 	async->bio_flags = bio_flags;
 	async->bio_offset = bio_offset;
 
@@ -837,9 +836,9 @@ int btrfs_wq_submit_bio(struct btrfs_fs_info *fs_info,
struct inode *inode,
 	atomic_inc(&fs_info->nr_async_submits);
 
 	if (rw & REQ_SYNC)
-		btrfs_set_work_high_prio(&async->work);
+		btrfs_set_work_high_priority(&async->work);
 
-	btrfs_queue_worker(&fs_info->workers, &async->work);
+	btrfs_queue_work(fs_info->workers, &async->work);
 
 	while (atomic_read(&fs_info->async_submit_draining) &&
 	      atomic_read(&fs_info->nr_async_submits)) {
@@ -1987,7 +1986,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info
*fs_info)
 {
 	btrfs_stop_workers(&fs_info->generic_worker);
 	btrfs_stop_workers(&fs_info->delalloc_workers);
-	btrfs_stop_workers(&fs_info->workers);
+	btrfs_destroy_workqueue(fs_info->workers);
 	btrfs_stop_workers(&fs_info->submit_workers);
 	destroy_workqueue(fs_info->fixup_workers);
 	destroy_workqueue(fs_info->endio_workers);
@@ -2462,9 +2461,8 @@ int open_ctree(struct super_block *sb,
 	btrfs_init_workers(&fs_info->generic_worker,
 			   "genwork", 1, NULL);
 
-	btrfs_init_workers(&fs_info->workers, "worker",
-			   fs_info->thread_pool_size,
-			   &fs_info->generic_worker);
+	fs_info->workers = btrfs_alloc_workqueue("worker",
"worker-ordered",
+						 "worker-high", max_active);
 
 	btrfs_init_workers(&fs_info->delalloc_workers, "delalloc",
 			   fs_info->thread_pool_size,
@@ -2478,9 +2476,6 @@ int open_ctree(struct super_block *sb,
 			   &fs_info->generic_worker);
 	fs_info->caching_workers = alloc_workqueue("cache", flags, 2);
 
-	fs_info->workers.idle_thresh = 16;
-	fs_info->workers.ordered = 1;
-
 	fs_info->delalloc_workers.idle_thresh = 2;
 	fs_info->delalloc_workers.ordered = 1;
 
@@ -2507,13 +2502,12 @@ int open_ctree(struct super_block *sb,
 	 * btrfs_start_workers can really only fail because of ENOMEM so just
 	 * return -ENOMEM if any of these fail.
 	 */
-	ret = btrfs_start_workers(&fs_info->workers);
-	ret |= btrfs_start_workers(&fs_info->generic_worker);
+	ret = btrfs_start_workers(&fs_info->generic_worker);
 	ret |= btrfs_start_workers(&fs_info->delalloc_workers);
 	ret |= btrfs_start_workers(&fs_info->submit_workers);
 
 	if (ret || !(fs_info->flush_workers && fs_info->endio_workers
&&
-		     fs_info->endio_meta_workers &&
+		     fs_info->endio_meta_workers && fs_info->workers &&
 		     fs_info->endio_raid56_workers &&
 		     fs_info->rmw_workers && fs_info->qgroup_rescan_workers
&&
 		     fs_info->endio_meta_write_workers &&
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index f557ab6..8fe41f9 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -56,6 +56,7 @@
 #include "rcu-string.h"
 #include "dev-replace.h"
 #include "free-space-cache.h"
+#include "bwq.h"
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/btrfs.h>
@@ -1174,7 +1175,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info
*fs_info,
 	       old_pool_size, new_pool_size);
 
 	btrfs_set_max_workers(&fs_info->generic_worker, new_pool_size);
-	btrfs_set_max_workers(&fs_info->workers, new_pool_size);
+	btrfs_workqueue_set_max(fs_info->workers, new_pool_size);
 	btrfs_set_max_workers(&fs_info->delalloc_workers, new_pool_size);
 	btrfs_set_max_workers(&fs_info->submit_workers, new_pool_size);
 	workqueue_set_max_active(fs_info->caching_workers, new_pool_size);
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo

2013-Sep-12 08:08 UTC

head link

[PATCH v2 6/9] btrfs: Use btrfs_workqueue_struct to replace the fs_info->delalloc_workers

Much like the fs_info->workers, replace the fs_info->delalloc_workers
use the same btrfs_workqueue.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/disk-io.c | 13 +++++--------
 fs/btrfs/inode.c   | 19 +++++++++----------
 fs/btrfs/super.c   |  2 +-
 4 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 2662ef2..81aba0e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1477,10 +1477,10 @@ struct btrfs_fs_info {
 	 * two
 	 */
 	struct btrfs_workers generic_worker;
-	struct btrfs_workers delalloc_workers;
 	struct btrfs_workers submit_workers;
 
 	struct btrfs_workqueue_struct *workers;
+	struct btrfs_workqueue_struct *delalloc_workers;
 	struct workqueue_struct *flush_workers;
 	struct workqueue_struct *endio_workers;
 	struct workqueue_struct *endio_meta_workers;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 364c409..fd795b6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1985,7 +1985,7 @@ static noinline int next_root_backup(struct btrfs_fs_info
*info,
 static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info)
 {
 	btrfs_stop_workers(&fs_info->generic_worker);
-	btrfs_stop_workers(&fs_info->delalloc_workers);
+	btrfs_destroy_workqueue(fs_info->delalloc_workers);
 	btrfs_destroy_workqueue(fs_info->workers);
 	btrfs_stop_workers(&fs_info->submit_workers);
 	destroy_workqueue(fs_info->fixup_workers);
@@ -2464,9 +2464,9 @@ int open_ctree(struct super_block *sb,
 	fs_info->workers = btrfs_alloc_workqueue("worker",
"worker-ordered",
 						 "worker-high", max_active);
 
-	btrfs_init_workers(&fs_info->delalloc_workers, "delalloc",
-			   fs_info->thread_pool_size,
-			   &fs_info->generic_worker);
+	fs_info->delalloc_workers = btrfs_alloc_workqueue("delalloc",
+							  "delalloc-ordered",
+							  NULL, max_active);
 
 	fs_info->flush_workers = alloc_workqueue("flush_delalloc", flags,
 						 max_active);
@@ -2476,9 +2476,6 @@ int open_ctree(struct super_block *sb,
 			   &fs_info->generic_worker);
 	fs_info->caching_workers = alloc_workqueue("cache", flags, 2);
 
-	fs_info->delalloc_workers.idle_thresh = 2;
-	fs_info->delalloc_workers.ordered = 1;
-
 	fs_info->fixup_workers = alloc_workqueue("fixup", flags, 1);
 	fs_info->endio_workers = alloc_workqueue("endio", flags,
max_active);
 	fs_info->endio_meta_workers = alloc_workqueue("endio-meta",
flags,
@@ -2503,11 +2500,11 @@ int open_ctree(struct super_block *sb,
 	 * return -ENOMEM if any of these fail.
 	 */
 	ret = btrfs_start_workers(&fs_info->generic_worker);
-	ret |= btrfs_start_workers(&fs_info->delalloc_workers);
 	ret |= btrfs_start_workers(&fs_info->submit_workers);
 
 	if (ret || !(fs_info->flush_workers && fs_info->endio_workers
&&
 		     fs_info->endio_meta_workers && fs_info->workers &&
+		     fs_info->delalloc_workers &&
 		     fs_info->endio_raid56_workers &&
 		     fs_info->rmw_workers && fs_info->qgroup_rescan_workers
&&
 		     fs_info->endio_meta_write_workers &&
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 53901a5..0ae21a6 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -59,6 +59,7 @@
 #include "inode-map.h"
 #include "backref.h"
 #include "hash.h"
+#include "bwq.h"
 
 struct btrfs_iget_args {
 	u64 ino;
@@ -295,7 +296,7 @@ struct async_cow {
 	u64 start;
 	u64 end;
 	struct list_head extents;
-	struct btrfs_work work;
+	struct btrfs_work_struct work;
 };
 
 static noinline int add_async_extent(struct async_cow *cow,
@@ -1057,7 +1058,7 @@ static noinline int cow_file_range(struct inode *inode,
 /*
  * work queue call back to started compression on a file and pages
  */
-static noinline void async_cow_start(struct btrfs_work *work)
+static noinline void async_cow_start(struct btrfs_work_struct *work)
 {
 	struct async_cow *async_cow;
 	int num_added = 0;
@@ -1075,7 +1076,7 @@ static noinline void async_cow_start(struct btrfs_work
*work)
 /*
  * work queue call back to submit previously compressed pages
  */
-static noinline void async_cow_submit(struct btrfs_work *work)
+static noinline void async_cow_submit(struct btrfs_work_struct *work)
 {
 	struct async_cow *async_cow;
 	struct btrfs_root *root;
@@ -1096,7 +1097,7 @@ static noinline void async_cow_submit(struct btrfs_work
*work)
 		submit_compressed_extents(async_cow->inode, async_cow);
 }
 
-static noinline void async_cow_free(struct btrfs_work *work)
+static noinline void async_cow_free(struct btrfs_work_struct *work)
 {
 	struct async_cow *async_cow;
 	async_cow = container_of(work, struct async_cow, work);
@@ -1133,17 +1134,15 @@ static int cow_file_range_async(struct inode *inode,
struct page *locked_page,
 		async_cow->end = cur_end;
 		INIT_LIST_HEAD(&async_cow->extents);
 
-		async_cow->work.func = async_cow_start;
-		async_cow->work.ordered_func = async_cow_submit;
-		async_cow->work.ordered_free = async_cow_free;
-		async_cow->work.flags = 0;
+		btrfs_init_work(&async_cow->work, async_cow_start,
+				async_cow_submit, async_cow_free);
 
 		nr_pages = (cur_end - start + PAGE_CACHE_SIZE) >>
 			PAGE_CACHE_SHIFT;
 		atomic_add(nr_pages, &root->fs_info->async_delalloc_pages);
 
-		btrfs_queue_worker(&root->fs_info->delalloc_workers,
-				   &async_cow->work);
+		btrfs_queue_work(root->fs_info->delalloc_workers,
+				 &async_cow->work);
 
 		if (atomic_read(&root->fs_info->async_delalloc_pages) > limit) {
 			wait_event(root->fs_info->async_submit_wait,
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 8fe41f9..771b98a 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1176,7 +1176,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info
*fs_info,
 
 	btrfs_set_max_workers(&fs_info->generic_worker, new_pool_size);
 	btrfs_workqueue_set_max(fs_info->workers, new_pool_size);
-	btrfs_set_max_workers(&fs_info->delalloc_workers, new_pool_size);
+	btrfs_workqueue_set_max(fs_info->delalloc_workers, new_pool_size);
 	btrfs_set_max_workers(&fs_info->submit_workers, new_pool_size);
 	workqueue_set_max_active(fs_info->caching_workers, new_pool_size);
 	workqueue_set_max_active(fs_info->fixup_workers, new_pool_size);
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo

2013-Sep-12 08:08 UTC

head link

[PATCH v2 7/9] btrfs: Replace the fs_info->submit_workers with kernel workqueue.

Replace the submit worker with kernel workqueue.
The submit_workers is different from other workers in the following
things:
1) Requeue:
This is quiet easy, just queue_work can handle it.

2) Initialize:
The work_struct in btrfs_devices should be initialized carefully to
prevent broken work_struct to be queued.

Besides this, not much to worry about.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/ctree.h   |  2 +-
 fs/btrfs/disk-io.c | 10 ++++------
 fs/btrfs/super.c   |  2 +-
 fs/btrfs/volumes.c | 17 +++++++++--------
 fs/btrfs/volumes.h |  2 +-
 5 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 81aba0e..0cf4320 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1477,10 +1477,10 @@ struct btrfs_fs_info {
 	 * two
 	 */
 	struct btrfs_workers generic_worker;
-	struct btrfs_workers submit_workers;
 
 	struct btrfs_workqueue_struct *workers;
 	struct btrfs_workqueue_struct *delalloc_workers;
+	struct workqueue_struct *submit_workers;
 	struct workqueue_struct *flush_workers;
 	struct workqueue_struct *endio_workers;
 	struct workqueue_struct *endio_meta_workers;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index fd795b6..ca2257d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1987,7 +1987,7 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info
*fs_info)
 	btrfs_stop_workers(&fs_info->generic_worker);
 	btrfs_destroy_workqueue(fs_info->delalloc_workers);
 	btrfs_destroy_workqueue(fs_info->workers);
-	btrfs_stop_workers(&fs_info->submit_workers);
+	destroy_workqueue(fs_info->submit_workers);
 	destroy_workqueue(fs_info->fixup_workers);
 	destroy_workqueue(fs_info->endio_workers);
 	destroy_workqueue(fs_info->endio_meta_workers);
@@ -2470,10 +2470,8 @@ int open_ctree(struct super_block *sb,
 
 	fs_info->flush_workers = alloc_workqueue("flush_delalloc", flags,
 						 max_active);
-	btrfs_init_workers(&fs_info->submit_workers, "submit",
-			   min_t(u64, fs_devices->num_devices,
-			   fs_info->thread_pool_size),
-			   &fs_info->generic_worker);
+	fs_info->submit_workers = alloc_workqueue("submit", flags,
+						  max_active);
 	fs_info->caching_workers = alloc_workqueue("cache", flags, 2);
 
 	fs_info->fixup_workers = alloc_workqueue("fixup", flags, 1);
@@ -2500,11 +2498,11 @@ int open_ctree(struct super_block *sb,
 	 * return -ENOMEM if any of these fail.
 	 */
 	ret = btrfs_start_workers(&fs_info->generic_worker);
-	ret |= btrfs_start_workers(&fs_info->submit_workers);
 
 	if (ret || !(fs_info->flush_workers && fs_info->endio_workers
&&
 		     fs_info->endio_meta_workers && fs_info->workers &&
 		     fs_info->delalloc_workers &&
+		     fs_info->submit_workers &&
 		     fs_info->endio_raid56_workers &&
 		     fs_info->rmw_workers && fs_info->qgroup_rescan_workers
&&
 		     fs_info->endio_meta_write_workers &&
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 771b98a..402b488 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1177,7 +1177,7 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info
*fs_info,
 	btrfs_set_max_workers(&fs_info->generic_worker, new_pool_size);
 	btrfs_workqueue_set_max(fs_info->workers, new_pool_size);
 	btrfs_workqueue_set_max(fs_info->delalloc_workers, new_pool_size);
-	btrfs_set_max_workers(&fs_info->submit_workers, new_pool_size);
+	workqueue_set_max_active(fs_info->submit_workers, new_pool_size);
 	workqueue_set_max_active(fs_info->caching_workers, new_pool_size);
 	workqueue_set_max_active(fs_info->fixup_workers, new_pool_size);
 	workqueue_set_max_active(fs_info->endio_workers, new_pool_size);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 12eaf89..cb10e02 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -352,7 +352,7 @@ loop_lock:
 			device->running_pending = 1;
 
 			spin_unlock(&device->io_lock);
-			btrfs_requeue_work(&device->work);
+			queue_work(fs_info->submit_workers, &device->work);
 			goto done;
 		}
 		/* unplug every 64 requests just for good measure */
@@ -376,7 +376,7 @@ done:
 	blk_finish_plug(&plug);
 }
 
-static void pending_bios_fn(struct btrfs_work *work)
+static void pending_bios_fn(struct work_struct *work)
 {
 	struct btrfs_device *device;
 
@@ -421,7 +421,7 @@ static noinline int device_list_add(const char *path,
 		}
 		device->devid = devid;
 		device->dev_stats_valid = 0;
-		device->work.func = pending_bios_fn;
+		INIT_WORK(&device->work, pending_bios_fn);
 		memcpy(device->uuid, disk_super->dev_item.uuid,
 		       BTRFS_UUID_SIZE);
 		spin_lock_init(&device->io_lock);
@@ -507,7 +507,7 @@ static struct btrfs_fs_devices *clone_fs_devices(struct
btrfs_fs_devices *orig)
 		rcu_assign_pointer(device->name, name);
 
 		device->devid = orig_dev->devid;
-		device->work.func = pending_bios_fn;
+		INIT_WORK(&device->work, pending_bios_fn);
 		memcpy(device->uuid, orig_dev->uuid, sizeof(device->uuid));
 		spin_lock_init(&device->io_lock);
 		INIT_LIST_HEAD(&device->dev_list);
@@ -652,6 +652,7 @@ static int __btrfs_close_devices(struct btrfs_fs_devices
*fs_devices)
 		new_device->in_fs_metadata = 0;
 		new_device->can_discard = 0;
 		spin_lock_init(&new_device->io_lock);
+		INIT_WORK(&new_device->work, pending_bios_fn);
 		list_replace_rcu(&device->dev_list, &new_device->dev_list);
 
 		call_rcu(&device->rcu, free_device);
@@ -1992,7 +1993,7 @@ int btrfs_init_new_device(struct btrfs_root *root, char
*device_path)
 	if (blk_queue_discard(q))
 		device->can_discard = 1;
 	device->writeable = 1;
-	device->work.func = pending_bios_fn;
+	INIT_WORK(&device->work, pending_bios_fn);
 	generate_random_uuid(device->uuid);
 	spin_lock_init(&device->io_lock);
 	device->generation = trans->transid;
@@ -5087,8 +5088,8 @@ static noinline void btrfs_schedule_bio(struct btrfs_root
*root,
 	spin_unlock(&device->io_lock);
 
 	if (should_queue)
-		btrfs_queue_worker(&root->fs_info->submit_workers,
-				   &device->work);
+		queue_work(root->fs_info->submit_workers,
+			   &device->work);
 }
 
 static int bio_size_ok(struct block_device *bdev, struct bio *bio,
@@ -5313,7 +5314,7 @@ static struct btrfs_device *add_missing_dev(struct
btrfs_root *root,
 	list_add(&device->dev_list,
 		 &fs_devices->devices);
 	device->devid = devid;
-	device->work.func = pending_bios_fn;
+	INIT_WORK(&device->work, pending_bios_fn);
 	device->fs_devices = fs_devices;
 	device->missing = 1;
 	fs_devices->num_devices++;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 8670558..bd849cc 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -91,7 +91,7 @@ struct btrfs_device {
 	/* per-device scrub information */
 	struct scrub_ctx *scrub_device;
 
-	struct btrfs_work work;
+	struct work_struct work;
 	struct rcu_head rcu;
 	struct work_struct rcu_work;
 
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo

2013-Sep-12 08:08 UTC

head link

[PATCH v2 8/9] btrfs: Cleanup the old btrfs workqueue

Since the patches before implemented the new kernel workqueue based
btrfs_worqueue_struct, the old btrfs workqueue(btrfs_worker) can be
removed without any problem.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/Makefile       |   2 +-
 fs/btrfs/async-thread.c | 714 ------------------------------------------------
 fs/btrfs/async-thread.h | 119 --------
 fs/btrfs/ctree.h        |   3 -
 fs/btrfs/dev-replace.c  |   1 -
 fs/btrfs/disk-io.c      |  25 +-
 fs/btrfs/raid56.c       |   1 -
 fs/btrfs/relocation.c   |   1 -
 fs/btrfs/super.c        |   8 -
 fs/btrfs/volumes.c      |   1 -
 fs/btrfs/volumes.h      |   1 -
 11 files changed, 10 insertions(+), 866 deletions(-)
 delete mode 100644 fs/btrfs/async-thread.c
 delete mode 100644 fs/btrfs/async-thread.h

diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile
index d7439df..e2162af 100644
--- a/fs/btrfs/Makefile
+++ b/fs/btrfs/Makefile
@@ -5,7 +5,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o
root-tree.o dir-item.o \
 	   file-item.o inode-item.o inode-map.o disk-io.o \
 	   transaction.o inode.o file.o tree-defrag.o \
 	   extent_map.o sysfs.o struct-funcs.o xattr.o ordered-data.o \
-	   extent_io.o volumes.o async-thread.o ioctl.o locking.o orphan.o \
+	   extent_io.o volumes.o ioctl.o locking.o orphan.o \
 	   export.o tree-log.o free-space-cache.o zlib.o lzo.o \
 	   compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \
 	   reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \
diff --git a/fs/btrfs/async-thread.c b/fs/btrfs/async-thread.c
deleted file mode 100644
index 58b7d14..0000000
--- a/fs/btrfs/async-thread.c
+++ /dev/null
@@ -1,714 +0,0 @@
-/*
- * Copyright (C) 2007 Oracle.  All rights reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public
- * License v2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public
- * License along with this program; if not, write to the
- * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
- * Boston, MA 021110-1307, USA.
- */
-
-#include <linux/kthread.h>
-#include <linux/slab.h>
-#include <linux/list.h>
-#include <linux/spinlock.h>
-#include <linux/freezer.h>
-#include "async-thread.h"
-
-#define WORK_QUEUED_BIT 0
-#define WORK_DONE_BIT 1
-#define WORK_ORDER_DONE_BIT 2
-#define WORK_HIGH_PRIO_BIT 3
-
-/*
- * container for the kthread task pointer and the list of pending work
- * One of these is allocated per thread.
- */
-struct btrfs_worker_thread {
-	/* pool we belong to */
-	struct btrfs_workers *workers;
-
-	/* list of struct btrfs_work that are waiting for service */
-	struct list_head pending;
-	struct list_head prio_pending;
-
-	/* list of worker threads from struct btrfs_workers */
-	struct list_head worker_list;
-
-	/* kthread */
-	struct task_struct *task;
-
-	/* number of things on the pending list */
-	atomic_t num_pending;
-
-	/* reference counter for this struct */
-	atomic_t refs;
-
-	unsigned long sequence;
-
-	/* protects the pending list. */
-	spinlock_t lock;
-
-	/* set to non-zero when this thread is already awake and kicking */
-	int working;
-
-	/* are we currently idle */
-	int idle;
-};
-
-static int __btrfs_start_workers(struct btrfs_workers *workers);
-
-/*
- * btrfs_start_workers uses kthread_run, which can block waiting for memory
- * for a very long time.  It will actually throttle on page writeback,
- * and so it may not make progress until after our btrfs worker threads
- * process all of the pending work structs in their queue
- *
- * This means we can''t use btrfs_start_workers from inside a btrfs
worker
- * thread that is used as part of cleaning dirty memory, which pretty much
- * involves all of the worker threads.
- *
- * Instead we have a helper queue who never has more than one thread
- * where we scheduler thread start operations.  This worker_start struct
- * is used to contain the work and hold a pointer to the queue that needs
- * another worker.
- */
-struct worker_start {
-	struct btrfs_work work;
-	struct btrfs_workers *queue;
-};
-
-static void start_new_worker_func(struct btrfs_work *work)
-{
-	struct worker_start *start;
-	start = container_of(work, struct worker_start, work);
-	__btrfs_start_workers(start->queue);
-	kfree(start);
-}
-
-/*
- * helper function to move a thread onto the idle list after it
- * has finished some requests.
- */
-static void check_idle_worker(struct btrfs_worker_thread *worker)
-{
-	if (!worker->idle && atomic_read(&worker->num_pending) <
-	    worker->workers->idle_thresh / 2) {
-		unsigned long flags;
-		spin_lock_irqsave(&worker->workers->lock, flags);
-		worker->idle = 1;
-
-		/* the list may be empty if the worker is just starting */
-		if (!list_empty(&worker->worker_list)) {
-			list_move(&worker->worker_list,
-				 &worker->workers->idle_list);
-		}
-		spin_unlock_irqrestore(&worker->workers->lock, flags);
-	}
-}
-
-/*
- * helper function to move a thread off the idle list after new
- * pending work is added.
- */
-static void check_busy_worker(struct btrfs_worker_thread *worker)
-{
-	if (worker->idle && atomic_read(&worker->num_pending) >-	
worker->workers->idle_thresh) {
-		unsigned long flags;
-		spin_lock_irqsave(&worker->workers->lock, flags);
-		worker->idle = 0;
-
-		if (!list_empty(&worker->worker_list)) {
-			list_move_tail(&worker->worker_list,
-				      &worker->workers->worker_list);
-		}
-		spin_unlock_irqrestore(&worker->workers->lock, flags);
-	}
-}
-
-static void check_pending_worker_creates(struct btrfs_worker_thread *worker)
-{
-	struct btrfs_workers *workers = worker->workers;
-	struct worker_start *start;
-	unsigned long flags;
-
-	rmb();
-	if (!workers->atomic_start_pending)
-		return;
-
-	start = kzalloc(sizeof(*start), GFP_NOFS);
-	if (!start)
-		return;
-
-	start->work.func = start_new_worker_func;
-	start->queue = workers;
-
-	spin_lock_irqsave(&workers->lock, flags);
-	if (!workers->atomic_start_pending)
-		goto out;
-
-	workers->atomic_start_pending = 0;
-	if (workers->num_workers + workers->num_workers_starting >-	   
workers->max_workers)
-		goto out;
-
-	workers->num_workers_starting += 1;
-	spin_unlock_irqrestore(&workers->lock, flags);
-	btrfs_queue_worker(workers->atomic_worker_start, &start->work);
-	return;
-
-out:
-	kfree(start);
-	spin_unlock_irqrestore(&workers->lock, flags);
-}
-
-static noinline void run_ordered_completions(struct btrfs_workers *workers,
-					    struct btrfs_work *work)
-{
-	if (!workers->ordered)
-		return;
-
-	set_bit(WORK_DONE_BIT, &work->flags);
-
-	spin_lock(&workers->order_lock);
-
-	while (1) {
-		if (!list_empty(&workers->prio_order_list)) {
-			work = list_entry(workers->prio_order_list.next,
-					  struct btrfs_work, order_list);
-		} else if (!list_empty(&workers->order_list)) {
-			work = list_entry(workers->order_list.next,
-					  struct btrfs_work, order_list);
-		} else {
-			break;
-		}
-		if (!test_bit(WORK_DONE_BIT, &work->flags))
-			break;
-
-		/* we are going to call the ordered done function, but
-		 * we leave the work item on the list as a barrier so
-		 * that later work items that are done don''t have their
-		 * functions called before this one returns
-		 */
-		if (test_and_set_bit(WORK_ORDER_DONE_BIT, &work->flags))
-			break;
-
-		spin_unlock(&workers->order_lock);
-
-		work->ordered_func(work);
-
-		/* now take the lock again and drop our item from the list */
-		spin_lock(&workers->order_lock);
-		list_del(&work->order_list);
-		spin_unlock(&workers->order_lock);
-
-		/*
-		 * we don''t want to call the ordered free functions
-		 * with the lock held though
-		 */
-		work->ordered_free(work);
-		spin_lock(&workers->order_lock);
-	}
-
-	spin_unlock(&workers->order_lock);
-}
-
-static void put_worker(struct btrfs_worker_thread *worker)
-{
-	if (atomic_dec_and_test(&worker->refs))
-		kfree(worker);
-}
-
-static int try_worker_shutdown(struct btrfs_worker_thread *worker)
-{
-	int freeit = 0;
-
-	spin_lock_irq(&worker->lock);
-	spin_lock(&worker->workers->lock);
-	if (worker->workers->num_workers > 1 &&
-	    worker->idle &&
-	    !worker->working &&
-	    !list_empty(&worker->worker_list) &&
-	    list_empty(&worker->prio_pending) &&
-	    list_empty(&worker->pending) &&
-	    atomic_read(&worker->num_pending) == 0) {
-		freeit = 1;
-		list_del_init(&worker->worker_list);
-		worker->workers->num_workers--;
-	}
-	spin_unlock(&worker->workers->lock);
-	spin_unlock_irq(&worker->lock);
-
-	if (freeit)
-		put_worker(worker);
-	return freeit;
-}
-
-static struct btrfs_work *get_next_work(struct btrfs_worker_thread *worker,
-					struct list_head *prio_head,
-					struct list_head *head)
-{
-	struct btrfs_work *work = NULL;
-	struct list_head *cur = NULL;
-
-	if(!list_empty(prio_head))
-		cur = prio_head->next;
-
-	smp_mb();
-	if (!list_empty(&worker->prio_pending))
-		goto refill;
-
-	if (!list_empty(head))
-		cur = head->next;
-
-	if (cur)
-		goto out;
-
-refill:
-	spin_lock_irq(&worker->lock);
-	list_splice_tail_init(&worker->prio_pending, prio_head);
-	list_splice_tail_init(&worker->pending, head);
-
-	if (!list_empty(prio_head))
-		cur = prio_head->next;
-	else if (!list_empty(head))
-		cur = head->next;
-	spin_unlock_irq(&worker->lock);
-
-	if (!cur)
-		goto out_fail;
-
-out:
-	work = list_entry(cur, struct btrfs_work, list);
-
-out_fail:
-	return work;
-}
-
-/*
- * main loop for servicing work items
- */
-static int worker_loop(void *arg)
-{
-	struct btrfs_worker_thread *worker = arg;
-	struct list_head head;
-	struct list_head prio_head;
-	struct btrfs_work *work;
-
-	INIT_LIST_HEAD(&head);
-	INIT_LIST_HEAD(&prio_head);
-
-	do {
-again:
-		while (1) {
-
-
-			work = get_next_work(worker, &prio_head, &head);
-			if (!work)
-				break;
-
-			list_del(&work->list);
-			clear_bit(WORK_QUEUED_BIT, &work->flags);
-
-			work->worker = worker;
-
-			work->func(work);
-
-			atomic_dec(&worker->num_pending);
-			/*
-			 * unless this is an ordered work queue,
-			 * ''work'' was probably freed by func above.
-			 */
-			run_ordered_completions(worker->workers, work);
-
-			check_pending_worker_creates(worker);
-			cond_resched();
-		}
-
-		spin_lock_irq(&worker->lock);
-		check_idle_worker(worker);
-
-		if (freezing(current)) {
-			worker->working = 0;
-			spin_unlock_irq(&worker->lock);
-			try_to_freeze();
-		} else {
-			spin_unlock_irq(&worker->lock);
-			if (!kthread_should_stop()) {
-				cpu_relax();
-				/*
-				 * we''ve dropped the lock, did someone else
-				 * jump_in?
-				 */
-				smp_mb();
-				if (!list_empty(&worker->pending) ||
-				    !list_empty(&worker->prio_pending))
-					continue;
-
-				/*
-				 * this short schedule allows more work to
-				 * come in without the queue functions
-				 * needing to go through wake_up_process()
-				 *
-				 * worker->working is still 1, so nobody
-				 * is going to try and wake us up
-				 */
-				schedule_timeout(1);
-				smp_mb();
-				if (!list_empty(&worker->pending) ||
-				    !list_empty(&worker->prio_pending))
-					continue;
-
-				if (kthread_should_stop())
-					break;
-
-				/* still no more work?, sleep for real */
-				spin_lock_irq(&worker->lock);
-				set_current_state(TASK_INTERRUPTIBLE);
-				if (!list_empty(&worker->pending) ||
-				    !list_empty(&worker->prio_pending)) {
-					spin_unlock_irq(&worker->lock);
-					set_current_state(TASK_RUNNING);
-					goto again;
-				}
-
-				/*
-				 * this makes sure we get a wakeup when someone
-				 * adds something new to the queue
-				 */
-				worker->working = 0;
-				spin_unlock_irq(&worker->lock);
-
-				if (!kthread_should_stop()) {
-					schedule_timeout(HZ * 120);
-					if (!worker->working &&
-					    try_worker_shutdown(worker)) {
-						return 0;
-					}
-				}
-			}
-			__set_current_state(TASK_RUNNING);
-		}
-	} while (!kthread_should_stop());
-	return 0;
-}
-
-/*
- * this will wait for all the worker threads to shutdown
- */
-void btrfs_stop_workers(struct btrfs_workers *workers)
-{
-	struct list_head *cur;
-	struct btrfs_worker_thread *worker;
-	int can_stop;
-
-	spin_lock_irq(&workers->lock);
-	list_splice_init(&workers->idle_list, &workers->worker_list);
-	while (!list_empty(&workers->worker_list)) {
-		cur = workers->worker_list.next;
-		worker = list_entry(cur, struct btrfs_worker_thread,
-				    worker_list);
-
-		atomic_inc(&worker->refs);
-		workers->num_workers -= 1;
-		if (!list_empty(&worker->worker_list)) {
-			list_del_init(&worker->worker_list);
-			put_worker(worker);
-			can_stop = 1;
-		} else
-			can_stop = 0;
-		spin_unlock_irq(&workers->lock);
-		if (can_stop)
-			kthread_stop(worker->task);
-		spin_lock_irq(&workers->lock);
-		put_worker(worker);
-	}
-	spin_unlock_irq(&workers->lock);
-}
-
-/*
- * simple init on struct btrfs_workers
- */
-void btrfs_init_workers(struct btrfs_workers *workers, char *name, int max,
-			struct btrfs_workers *async_helper)
-{
-	workers->num_workers = 0;
-	workers->num_workers_starting = 0;
-	INIT_LIST_HEAD(&workers->worker_list);
-	INIT_LIST_HEAD(&workers->idle_list);
-	INIT_LIST_HEAD(&workers->order_list);
-	INIT_LIST_HEAD(&workers->prio_order_list);
-	spin_lock_init(&workers->lock);
-	spin_lock_init(&workers->order_lock);
-	workers->max_workers = max;
-	workers->idle_thresh = 32;
-	workers->name = name;
-	workers->ordered = 0;
-	workers->atomic_start_pending = 0;
-	workers->atomic_worker_start = async_helper;
-}
-
-/*
- * starts new worker threads.  This does not enforce the max worker
- * count in case you need to temporarily go past it.
- */
-static int __btrfs_start_workers(struct btrfs_workers *workers)
-{
-	struct btrfs_worker_thread *worker;
-	int ret = 0;
-
-	worker = kzalloc(sizeof(*worker), GFP_NOFS);
-	if (!worker) {
-		ret = -ENOMEM;
-		goto fail;
-	}
-
-	INIT_LIST_HEAD(&worker->pending);
-	INIT_LIST_HEAD(&worker->prio_pending);
-	INIT_LIST_HEAD(&worker->worker_list);
-	spin_lock_init(&worker->lock);
-
-	atomic_set(&worker->num_pending, 0);
-	atomic_set(&worker->refs, 1);
-	worker->workers = workers;
-	worker->task = kthread_run(worker_loop, worker,
-				   "btrfs-%s-%d", workers->name,
-				   workers->num_workers + 1);
-	if (IS_ERR(worker->task)) {
-		ret = PTR_ERR(worker->task);
-		kfree(worker);
-		goto fail;
-	}
-	spin_lock_irq(&workers->lock);
-	list_add_tail(&worker->worker_list, &workers->idle_list);
-	worker->idle = 1;
-	workers->num_workers++;
-	workers->num_workers_starting--;
-	WARN_ON(workers->num_workers_starting < 0);
-	spin_unlock_irq(&workers->lock);
-
-	return 0;
-fail:
-	spin_lock_irq(&workers->lock);
-	workers->num_workers_starting--;
-	spin_unlock_irq(&workers->lock);
-	return ret;
-}
-
-int btrfs_start_workers(struct btrfs_workers *workers)
-{
-	spin_lock_irq(&workers->lock);
-	workers->num_workers_starting++;
-	spin_unlock_irq(&workers->lock);
-	return __btrfs_start_workers(workers);
-}
-
-/*
- * run through the list and find a worker thread that doesn''t have a
lot
- * to do right now.  This can return null if we aren''t yet at the
thread
- * count limit and all of the threads are busy.
- */
-static struct btrfs_worker_thread *next_worker(struct btrfs_workers *workers)
-{
-	struct btrfs_worker_thread *worker;
-	struct list_head *next;
-	int enforce_min;
-
-	enforce_min = (workers->num_workers + workers->num_workers_starting)
<
-		workers->max_workers;
-
-	/*
-	 * if we find an idle thread, don''t move it to the end of the
-	 * idle list.  This improves the chance that the next submission
-	 * will reuse the same thread, and maybe catch it while it is still
-	 * working
-	 */
-	if (!list_empty(&workers->idle_list)) {
-		next = workers->idle_list.next;
-		worker = list_entry(next, struct btrfs_worker_thread,
-				    worker_list);
-		return worker;
-	}
-	if (enforce_min || list_empty(&workers->worker_list))
-		return NULL;
-
-	/*
-	 * if we pick a busy task, move the task to the end of the list.
-	 * hopefully this will keep things somewhat evenly balanced.
-	 * Do the move in batches based on the sequence number.  This groups
-	 * requests submitted at roughly the same time onto the same worker.
-	 */
-	next = workers->worker_list.next;
-	worker = list_entry(next, struct btrfs_worker_thread, worker_list);
-	worker->sequence++;
-
-	if (worker->sequence % workers->idle_thresh == 0)
-		list_move_tail(next, &workers->worker_list);
-	return worker;
-}
-
-/*
- * selects a worker thread to take the next job.  This will either find
- * an idle worker, start a new worker up to the max count, or just return
- * one of the existing busy workers.
- */
-static struct btrfs_worker_thread *find_worker(struct btrfs_workers *workers)
-{
-	struct btrfs_worker_thread *worker;
-	unsigned long flags;
-	struct list_head *fallback;
-	int ret;
-
-	spin_lock_irqsave(&workers->lock, flags);
-again:
-	worker = next_worker(workers);
-
-	if (!worker) {
-		if (workers->num_workers + workers->num_workers_starting >-		   
workers->max_workers) {
-			goto fallback;
-		} else if (workers->atomic_worker_start) {
-			workers->atomic_start_pending = 1;
-			goto fallback;
-		} else {
-			workers->num_workers_starting++;
-			spin_unlock_irqrestore(&workers->lock, flags);
-			/* we''re below the limit, start another worker */
-			ret = __btrfs_start_workers(workers);
-			spin_lock_irqsave(&workers->lock, flags);
-			if (ret)
-				goto fallback;
-			goto again;
-		}
-	}
-	goto found;
-
-fallback:
-	fallback = NULL;
-	/*
-	 * we have failed to find any workers, just
-	 * return the first one we can find.
-	 */
-	if (!list_empty(&workers->worker_list))
-		fallback = workers->worker_list.next;
-	if (!list_empty(&workers->idle_list))
-		fallback = workers->idle_list.next;
-	BUG_ON(!fallback);
-	worker = list_entry(fallback,
-		  struct btrfs_worker_thread, worker_list);
-found:
-	/*
-	 * this makes sure the worker doesn''t exit before it is placed
-	 * onto a busy/idle list
-	 */
-	atomic_inc(&worker->num_pending);
-	spin_unlock_irqrestore(&workers->lock, flags);
-	return worker;
-}
-
-/*
- * btrfs_requeue_work just puts the work item back on the tail of the list
- * it was taken from.  It is intended for use with long running work functions
- * that make some progress and want to give the cpu up for others.
- */
-void btrfs_requeue_work(struct btrfs_work *work)
-{
-	struct btrfs_worker_thread *worker = work->worker;
-	unsigned long flags;
-	int wake = 0;
-
-	if (test_and_set_bit(WORK_QUEUED_BIT, &work->flags))
-		return;
-
-	spin_lock_irqsave(&worker->lock, flags);
-	if (test_bit(WORK_HIGH_PRIO_BIT, &work->flags))
-		list_add_tail(&work->list, &worker->prio_pending);
-	else
-		list_add_tail(&work->list, &worker->pending);
-	atomic_inc(&worker->num_pending);
-
-	/* by definition we''re busy, take ourselves off the idle
-	 * list
-	 */
-	if (worker->idle) {
-		spin_lock(&worker->workers->lock);
-		worker->idle = 0;
-		list_move_tail(&worker->worker_list,
-			      &worker->workers->worker_list);
-		spin_unlock(&worker->workers->lock);
-	}
-	if (!worker->working) {
-		wake = 1;
-		worker->working = 1;
-	}
-
-	if (wake)
-		wake_up_process(worker->task);
-	spin_unlock_irqrestore(&worker->lock, flags);
-}
-
-void btrfs_set_work_high_prio(struct btrfs_work *work)
-{
-	set_bit(WORK_HIGH_PRIO_BIT, &work->flags);
-}
-
-/*
- * places a struct btrfs_work into the pending queue of one of the kthreads
- */
-void btrfs_queue_worker(struct btrfs_workers *workers, struct btrfs_work *work)
-{
-	struct btrfs_worker_thread *worker;
-	unsigned long flags;
-	int wake = 0;
-
-	/* don''t requeue something already on a list */
-	if (test_and_set_bit(WORK_QUEUED_BIT, &work->flags))
-		return;
-
-	worker = find_worker(workers);
-	if (workers->ordered) {
-		/*
-		 * you''re not allowed to do ordered queues from an
-		 * interrupt handler
-		 */
-		spin_lock(&workers->order_lock);
-		if (test_bit(WORK_HIGH_PRIO_BIT, &work->flags)) {
-			list_add_tail(&work->order_list,
-				      &workers->prio_order_list);
-		} else {
-			list_add_tail(&work->order_list, &workers->order_list);
-		}
-		spin_unlock(&workers->order_lock);
-	} else {
-		INIT_LIST_HEAD(&work->order_list);
-	}
-
-	spin_lock_irqsave(&worker->lock, flags);
-
-	if (test_bit(WORK_HIGH_PRIO_BIT, &work->flags))
-		list_add_tail(&work->list, &worker->prio_pending);
-	else
-		list_add_tail(&work->list, &worker->pending);
-	check_busy_worker(worker);
-
-	/*
-	 * avoid calling into wake_up_process if this thread has already
-	 * been kicked
-	 */
-	if (!worker->working)
-		wake = 1;
-	worker->working = 1;
-
-	if (wake)
-		wake_up_process(worker->task);
-	spin_unlock_irqrestore(&worker->lock, flags);
-}
diff --git a/fs/btrfs/async-thread.h b/fs/btrfs/async-thread.h
deleted file mode 100644
index 063698b..0000000
--- a/fs/btrfs/async-thread.h
+++ /dev/null
@@ -1,119 +0,0 @@
-/*
- * Copyright (C) 2007 Oracle.  All rights reserved.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public
- * License v2 as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public
- * License along with this program; if not, write to the
- * Free Software Foundation, Inc., 59 Temple Place - Suite 330,
- * Boston, MA 021110-1307, USA.
- */
-
-#ifndef __BTRFS_ASYNC_THREAD_
-#define __BTRFS_ASYNC_THREAD_
-
-struct btrfs_worker_thread;
-
-/*
- * This is similar to a workqueue, but it is meant to spread the operations
- * across all available cpus instead of just the CPU that was used to
- * queue the work.  There is also some batching introduced to try and
- * cut down on context switches.
- *
- * By default threads are added on demand up to 2 * the number of cpus.
- * Changing struct btrfs_workers->max_workers is one way to prevent
- * demand creation of kthreads.
- *
- * the basic model of these worker threads is to embed a btrfs_work
- * structure in your own data struct, and use container_of in a
- * work function to get back to your data struct.
- */
-struct btrfs_work {
-	/*
-	 * func should be set to the function you want called
-	 * your work struct is passed as the only arg
-	 *
-	 * ordered_func must be set for work sent to an ordered work queue,
-	 * and it is called to complete a given work item in the same
-	 * order they were sent to the queue.
-	 */
-	void (*func)(struct btrfs_work *work);
-	void (*ordered_func)(struct btrfs_work *work);
-	void (*ordered_free)(struct btrfs_work *work);
-
-	/*
-	 * flags should be set to zero.  It is used to make sure the
-	 * struct is only inserted once into the list.
-	 */
-	unsigned long flags;
-
-	/* don''t touch these */
-	struct btrfs_worker_thread *worker;
-	struct list_head list;
-	struct list_head order_list;
-};
-
-struct btrfs_workers {
-	/* current number of running workers */
-	int num_workers;
-
-	int num_workers_starting;
-
-	/* max number of workers allowed.  changed by btrfs_start_workers */
-	int max_workers;
-
-	/* once a worker has this many requests or fewer, it is idle */
-	int idle_thresh;
-
-	/* force completions in the order they were queued */
-	int ordered;
-
-	/* more workers required, but in an interrupt handler */
-	int atomic_start_pending;
-
-	/*
-	 * are we allowed to sleep while starting workers or are we required
-	 * to start them at a later time?  If we can''t sleep, this indicates
-	 * which queue we need to use to schedule thread creation.
-	 */
-	struct btrfs_workers *atomic_worker_start;
-
-	/* list with all the work threads.  The workers on the idle thread
-	 * may be actively servicing jobs, but they haven''t yet hit the
-	 * idle thresh limit above.
-	 */
-	struct list_head worker_list;
-	struct list_head idle_list;
-
-	/*
-	 * when operating in ordered mode, this maintains the list
-	 * of work items waiting for completion
-	 */
-	struct list_head order_list;
-	struct list_head prio_order_list;
-
-	/* lock for finding the next worker thread to queue on */
-	spinlock_t lock;
-
-	/* lock for the ordered lists */
-	spinlock_t order_lock;
-
-	/* extra name for this worker, used for current->name */
-	char *name;
-};
-
-void btrfs_queue_worker(struct btrfs_workers *workers, struct btrfs_work
*work);
-int btrfs_start_workers(struct btrfs_workers *workers);
-void btrfs_stop_workers(struct btrfs_workers *workers);
-void btrfs_init_workers(struct btrfs_workers *workers, char *name, int max,
-			struct btrfs_workers *async_starter);
-void btrfs_requeue_work(struct btrfs_work *work);
-void btrfs_set_work_high_prio(struct btrfs_work *work);
-#endif
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0cf4320..399e85b 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -34,7 +34,6 @@
 #include <linux/btrfs.h>
 #include "extent_io.h"
 #include "extent_map.h"
-#include "async-thread.h"
 
 struct btrfs_trans_handle;
 struct btrfs_transaction;
@@ -1476,8 +1475,6 @@ struct btrfs_fs_info {
 	 * A third pool does submit_bio to avoid deadlocking with the other
 	 * two
 	 */
-	struct btrfs_workers generic_worker;
-
 	struct btrfs_workqueue_struct *workers;
 	struct btrfs_workqueue_struct *delalloc_workers;
 	struct workqueue_struct *submit_workers;
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 4253ad5..9323126 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -33,7 +33,6 @@
 #include "transaction.h"
 #include "print-tree.h"
 #include "volumes.h"
-#include "async-thread.h"
 #include "check-integrity.h"
 #include "rcu-string.h"
 #include "dev-replace.h"
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ca2257d..a61e1fe 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -39,7 +39,6 @@
 #include "btrfs_inode.h"
 #include "volumes.h"
 #include "print-tree.h"
-#include "async-thread.h"
 #include "locking.h"
 #include "tree-log.h"
 #include "free-space-cache.h"
@@ -1984,7 +1983,6 @@ static noinline int next_root_backup(struct btrfs_fs_info
*info,
 /* helper to cleanup workers */
 static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info)
 {
-	btrfs_stop_workers(&fs_info->generic_worker);
 	btrfs_destroy_workqueue(fs_info->delalloc_workers);
 	btrfs_destroy_workqueue(fs_info->workers);
 	destroy_workqueue(fs_info->submit_workers);
@@ -2458,8 +2456,6 @@ int open_ctree(struct super_block *sb,
 	}
 
 	max_active = fs_info->thread_pool_size;
-	btrfs_init_workers(&fs_info->generic_worker,
-			   "genwork", 1, NULL);
 
 	fs_info->workers = btrfs_alloc_workqueue("worker",
"worker-ordered",
 						 "worker-high", max_active);
@@ -2497,18 +2493,15 @@ int open_ctree(struct super_block *sb,
 	 * btrfs_start_workers can really only fail because of ENOMEM so just
 	 * return -ENOMEM if any of these fail.
 	 */
-	ret = btrfs_start_workers(&fs_info->generic_worker);
-
-	if (ret || !(fs_info->flush_workers && fs_info->endio_workers
&&
-		     fs_info->endio_meta_workers && fs_info->workers &&
-		     fs_info->delalloc_workers &&
-		     fs_info->submit_workers &&
-		     fs_info->endio_raid56_workers &&
-		     fs_info->rmw_workers && fs_info->qgroup_rescan_workers
&&
-		     fs_info->endio_meta_write_workers &&
-		     fs_info->endio_write_workers &&
-		     fs_info->caching_workers && fs_info->readahead_workers
&&
-		     fs_info->fixup_workers && fs_info->delayed_workers)) {
+
+	if (!(fs_info->flush_workers && fs_info->endio_workers
&&
+	      fs_info->endio_meta_workers && fs_info->workers &&
+	      fs_info->submit_workers && fs_info->endio_raid56_workers
&&
+	      fs_info->rmw_workers && fs_info->delalloc_workers
&&
+	      fs_info->endio_meta_write_workers &&
+	      fs_info->endio_write_workers && fs_info->caching_workers
&&
+	      fs_info->readahead_workers && fs_info->fixup_workers
&&
+	      fs_info->delayed_workers)) {
 		err = -ENOMEM;
 		goto fail_sb_buffer;
 	}
diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 4b7769d..04f3d5e 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -41,7 +41,6 @@
 #include "print-tree.h"
 #include "volumes.h"
 #include "raid56.h"
-#include "async-thread.h"
 #include "check-integrity.h"
 #include "rcu-string.h"
 
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 1209649..16b6db0 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -28,7 +28,6 @@
 #include "volumes.h"
 #include "locking.h"
 #include "btrfs_inode.h"
-#include "async-thread.h"
 #include "free-space-cache.h"
 #include "inode-map.h"
 
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 402b488..63e653c 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1156,13 +1156,6 @@ error_fs_info:
 	return ERR_PTR(error);
 }
 
-static void btrfs_set_max_workers(struct btrfs_workers *workers, int new_limit)
-{
-	spin_lock_irq(&workers->lock);
-	workers->max_workers = new_limit;
-	spin_unlock_irq(&workers->lock);
-}
-
 static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info,
 				     int new_pool_size, int old_pool_size)
 {
@@ -1174,7 +1167,6 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info
*fs_info,
 	printk(KERN_INFO "btrfs: resize thread pool %d -> %d\n",
 	       old_pool_size, new_pool_size);
 
-	btrfs_set_max_workers(&fs_info->generic_worker, new_pool_size);
 	btrfs_workqueue_set_max(fs_info->workers, new_pool_size);
 	btrfs_workqueue_set_max(fs_info->delalloc_workers, new_pool_size);
 	workqueue_set_max_active(fs_info->submit_workers, new_pool_size);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index cb10e02..69c0661 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -35,7 +35,6 @@
 #include "print-tree.h"
 #include "volumes.h"
 #include "raid56.h"
-#include "async-thread.h"
 #include "check-integrity.h"
 #include "rcu-string.h"
 #include "math.h"
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index bd849cc..0efb041 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -22,7 +22,6 @@
 #include <linux/bio.h>
 #include <linux/sort.h>
 #include <linux/btrfs.h>
-#include "async-thread.h"
 
 #define BTRFS_STRIPE_LEN	(64 * 1024)
 
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo

2013-Sep-12 08:08 UTC

head link

[PATCH v2 9/9] btrfs: Replace thread_pool_size with workqueue default value

The original btrfs_workers uses the fs_info->thread_pool_size as the
max_active, and the previous patches followed this way.

But the kernel workqueue has the default value(0) for workqueue,
and workqueue itself has some threshold mechanism to prevent creating
too many threads, so we should use the default value.

Since the thread_pool_size algorithm is not used, related codes should
also be changed.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
---
 fs/btrfs/disk-io.c | 12 +++++++-----
 fs/btrfs/super.c   |  3 +--
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index a61e1fe..0446d27 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -750,9 +750,11 @@ int btrfs_bio_wq_end_io(struct btrfs_fs_info *info, struct
bio *bio,
 
 unsigned long btrfs_async_submit_limit(struct btrfs_fs_info *info)
 {
-	unsigned long limit = min_t(unsigned long,
-				    info->thread_pool_size,
-				    info->fs_devices->open_devices);
+	unsigned long limit;
+	limit = info->thread_pool_size ?
+		min_t(unsigned long, info->thread_pool_size,
+		      info->fs_devices->open_devices) :
+		info->fs_devices->open_devices;
 	return 256 * limit;
 }
 
@@ -2191,8 +2193,8 @@ int open_ctree(struct super_block *sb,
 	INIT_RADIX_TREE(&fs_info->reada_tree, GFP_NOFS & ~__GFP_WAIT);
 	spin_lock_init(&fs_info->reada_lock);
 
-	fs_info->thread_pool_size = min_t(unsigned long,
-					  num_online_cpus() + 2, 8);
+	/* use the default value of kernel workqueue */
+	fs_info->thread_pool_size = 0;
 
 	INIT_LIST_HEAD(&fs_info->ordered_roots);
 	spin_lock_init(&fs_info->ordered_root_lock);
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 63e653c..ccf412f 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -898,8 +898,7 @@ static int btrfs_show_options(struct seq_file *seq, struct
dentry *dentry)
 	if (info->alloc_start != 0)
 		seq_printf(seq, ",alloc_start=%llu",
 			   (unsigned long long)info->alloc_start);
-	if (info->thread_pool_size !=  min_t(unsigned long,
-					     num_online_cpus() + 2, 8))
+	if (info->thread_pool_size)
 		seq_printf(seq, ",thread_pool=%d", info->thread_pool_size);
 	if (btrfs_test_opt(root, COMPRESS)) {
 		if (info->compress_type == BTRFS_COMPRESS_ZLIB)
-- 
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2013-Sep-12 17:37 UTC

head link

Re: [PATCH v2 0/9] btrfs: Replace the btrfs_workers with kernel workqueue

On Thu, Sep 12, 2013 at 04:08:15PM +0800, Qu Wenruo
wrote:> Use kernel workqueue and kernel workqueue based new btrfs_workqueue_struct
to replace
> the old btrfs_workers.
> The main goal is to reduce the redundant codes(800 lines vs 200 lines) and
> try to get benefits from the latest workqueue changes.
> 
> About the performance, the test suite I used is bonnie++,
> and there seems no significant regression.
You''re replacing a core infrastructure building block, more testing is
absolutely required, but using the available infrastructure is a good
move.

I found a few things that do not replace the current implementation
one-to-one:

* the thread names lost the btrfs- prefix, this makes it hard to
  identify the processes and we want this, either debugging or
  performance monitoring

* od high priority tasks were handled in threads with unchanged priority
  and just prioritized within the queue
  newly addded WQ_HIGHPRI elevates the nice level of the thread, ie.
  it''s not the same thing as before -- I need to look closer

* the idle_thresh attribute is not reflected in the new code, I don''t
  know if the kernel workqueues have something equivalent


Other random comments:

* you can use the same files for the new helpers, instead of bwq.[ch]

* btrfs_workqueue_struct can drop the _struct suffix

* WQ_MEM_RECLAIM for the scrub thread does not seem right

* WQ_FREEZABLE should be probably set


david
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2013-Sep-13 01:29 UTC

head link

Re: [PATCH v2 2/9] btrfs: use kernel workqueue to replace the btrfs_workers functions

On Thu, Sep 12, 2013 at 04:08:17PM +0800, Qu Wenruo
wrote:> Use the kernel workqueue to replace the btrfs_workers which are only
> used as normal workqueue.
> 
> Other btrfs_workers will use some extra functions like requeue, high
> priority and ordered work.
> These btrfs_workers will not be touched in this patch.
> 
> The followings are the untouched btrfs_workers:
> 
> generic_worker:		As the helper for other btrfs_workers
> workers:		Use the ordering and high priority features
> delalloc_workers: 	Use the ordering feature
> submit_workers:		Use requeue feature
> 
> All other workers can be replaced using the kernel workqueue directly.
Interesting, I''ve been doing the same work for a while, but
I''m still
doing the tuning work on kerner wq + btrfs.
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  fs/btrfs/ctree.h         |  39 +++++------
>  fs/btrfs/delayed-inode.c |   9 ++-
>  fs/btrfs/disk-io.c       | 164
++++++++++++++++++-----------------------------
>  fs/btrfs/extent-tree.c   |   6 +-
>  fs/btrfs/inode.c         |  38 +++++------
>  fs/btrfs/ordered-data.c  |  11 ++--
>  fs/btrfs/ordered-data.h  |   4 +-
>  fs/btrfs/qgroup.c        |  16 ++---
>  fs/btrfs/raid56.c        |  37 +++++------
>  fs/btrfs/reada.c         |   8 +--
>  fs/btrfs/scrub.c         |  84 ++++++++++++------------
>  fs/btrfs/super.c         |  23 ++++---
>  12 files changed, 196 insertions(+), 243 deletions(-)
> 
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index e795bf1..0dd6ec9 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1202,7 +1202,7 @@ struct btrfs_caching_control {
>  	struct list_head list;
>  	struct mutex mutex;
>  	wait_queue_head_t wait;
> -	struct btrfs_work work;
> +	struct work_struct work;
>  	struct btrfs_block_group_cache *block_group;
>  	u64 progress;
>  	atomic_t count;
> @@ -1479,25 +1479,26 @@ struct btrfs_fs_info {
>  	struct btrfs_workers generic_worker;
>  	struct btrfs_workers workers;
>  	struct btrfs_workers delalloc_workers;
> -	struct btrfs_workers flush_workers;
> -	struct btrfs_workers endio_workers;
> -	struct btrfs_workers endio_meta_workers;
> -	struct btrfs_workers endio_raid56_workers;
> -	struct btrfs_workers rmw_workers;
> -	struct btrfs_workers endio_meta_write_workers;
> -	struct btrfs_workers endio_write_workers;
> -	struct btrfs_workers endio_freespace_worker;
>  	struct btrfs_workers submit_workers;
> -	struct btrfs_workers caching_workers;
> -	struct btrfs_workers readahead_workers;
> +
> +	struct workqueue_struct *flush_workers;
> +	struct workqueue_struct *endio_workers;
> +	struct workqueue_struct *endio_meta_workers;
> +	struct workqueue_struct *endio_raid56_workers;
> +	struct workqueue_struct *rmw_workers;
> +	struct workqueue_struct *endio_meta_write_workers;
> +	struct workqueue_struct *endio_write_workers;
> +	struct workqueue_struct *endio_freespace_worker;
> +	struct workqueue_struct *caching_workers;
> +	struct workqueue_struct *readahead_workers;
>  
>  	/*
>  	 * fixup workers take dirty pages that didn''t properly go
through
>  	 * the cow mechanism and make them safe to write.  It happens
>  	 * for the sys_munmap function call path
>  	 */
> -	struct btrfs_workers fixup_workers;
> -	struct btrfs_workers delayed_workers;
> +	struct workqueue_struct *fixup_workers;
> +	struct workqueue_struct *delayed_workers;
>  	struct task_struct *transaction_kthread;
>  	struct task_struct *cleaner_kthread;
>  	int thread_pool_size;
> @@ -1576,9 +1577,9 @@ struct btrfs_fs_info {
>  	wait_queue_head_t scrub_pause_wait;
>  	struct rw_semaphore scrub_super_lock;
>  	int scrub_workers_refcnt;
> -	struct btrfs_workers scrub_workers;
> -	struct btrfs_workers scrub_wr_completion_workers;
> -	struct btrfs_workers scrub_nocow_workers;
> +	struct workqueue_struct *scrub_workers;
> +	struct workqueue_struct *scrub_wr_completion_workers;
> +	struct workqueue_struct *scrub_nocow_workers;
>  
>  #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
>  	u32 check_integrity_print_mask;
> @@ -1619,9 +1620,9 @@ struct btrfs_fs_info {
>  	/* qgroup rescan items */
>  	struct mutex qgroup_rescan_lock; /* protects the progress item */
>  	struct btrfs_key qgroup_rescan_progress;
> -	struct btrfs_workers qgroup_rescan_workers;
> +	struct workqueue_struct *qgroup_rescan_workers;
>  	struct completion qgroup_rescan_completion;
> -	struct btrfs_work qgroup_rescan_work;
> +	struct work_struct qgroup_rescan_work;
>  
>  	/* filesystem state */
>  	unsigned long fs_state;
> @@ -3542,7 +3543,7 @@ struct btrfs_delalloc_work {
>  	int delay_iput;
>  	struct completion completion;
>  	struct list_head list;
> -	struct btrfs_work work;
> +	struct work_struct work;
>  };
>  
>  struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode,
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index 5615eac..2b8da0a7 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -1258,10 +1258,10 @@ void btrfs_remove_delayed_node(struct inode *inode)
>  struct btrfs_async_delayed_work {
>  	struct btrfs_delayed_root *delayed_root;
>  	int nr;
> -	struct btrfs_work work;
> +	struct work_struct work;
>  };
>  
> -static void btrfs_async_run_delayed_root(struct btrfs_work *work)
> +static void btrfs_async_run_delayed_root(struct work_struct *work)
>  {
>  	struct btrfs_async_delayed_work *async_work;
>  	struct btrfs_delayed_root *delayed_root;
> @@ -1359,11 +1359,10 @@ static int btrfs_wq_run_delayed_node(struct
btrfs_delayed_root *delayed_root,
>  		return -ENOMEM;
>  
>  	async_work->delayed_root = delayed_root;
> -	async_work->work.func = btrfs_async_run_delayed_root;
> -	async_work->work.flags = 0;
> +	INIT_WORK(&async_work->work, btrfs_async_run_delayed_root);
>  	async_work->nr = nr;
>  
> -	btrfs_queue_worker(&root->fs_info->delayed_workers,
&async_work->work);
> +	queue_work(root->fs_info->delayed_workers,
&async_work->work);
>  	return 0;
>  }
>  
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 3c2886c..d02a552 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -54,7 +54,7 @@
>  #endif
>  
>  static struct extent_io_ops btree_extent_io_ops;
> -static void end_workqueue_fn(struct btrfs_work *work);
> +static void end_workqueue_fn(struct work_struct *work);
>  static void free_fs_root(struct btrfs_root *root);
>  static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
>  				    int read_only);
> @@ -86,7 +86,7 @@ struct end_io_wq {
>  	int error;
>  	int metadata;
>  	struct list_head list;
> -	struct btrfs_work work;
> +	struct work_struct work;
>  };
>  
>  /*
> @@ -692,31 +692,30 @@ static void end_workqueue_bio(struct bio *bio, int
err)
>  
>  	fs_info = end_io_wq->info;
>  	end_io_wq->error = err;
> -	end_io_wq->work.func = end_workqueue_fn;
> -	end_io_wq->work.flags = 0;
> +	INIT_WORK(&end_io_wq->work, end_workqueue_fn);
>  
>  	if (bio->bi_rw & REQ_WRITE) {
>  		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA)
> -			btrfs_queue_worker(&fs_info->endio_meta_write_workers,
> -					   &end_io_wq->work);
> +			queue_work(fs_info->endio_meta_write_workers,
> +				   &end_io_wq->work);
>  		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE)
> -			btrfs_queue_worker(&fs_info->endio_freespace_worker,
> -					   &end_io_wq->work);
> +			queue_work(fs_info->endio_freespace_worker,
> +				   &end_io_wq->work);
>  		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
> -			btrfs_queue_worker(&fs_info->endio_raid56_workers,
> -					   &end_io_wq->work);
> +			queue_work(fs_info->endio_raid56_workers,
> +				   &end_io_wq->work);
>  		else
> -			btrfs_queue_worker(&fs_info->endio_write_workers,
> -					   &end_io_wq->work);
> +			queue_work(fs_info->endio_write_workers,
> +				   &end_io_wq->work);
>  	} else {
>  		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
> -			btrfs_queue_worker(&fs_info->endio_raid56_workers,
> +			queue_work(fs_info->endio_raid56_workers,
>  					   &end_io_wq->work);
>  		else if (end_io_wq->metadata)
> -			btrfs_queue_worker(&fs_info->endio_meta_workers,
> +			queue_work(fs_info->endio_meta_workers,
>  					   &end_io_wq->work);
>  		else
> -			btrfs_queue_worker(&fs_info->endio_workers,
> +			queue_work(fs_info->endio_workers,
>  					   &end_io_wq->work);
>  	}
>  }
> @@ -1662,7 +1661,7 @@ static int setup_bdi(struct btrfs_fs_info *info,
struct backing_dev_info *bdi)
>   * called by the kthread helper functions to finally call the bio end_io
>   * functions.  This is where read checksum verification actually happens
>   */
> -static void end_workqueue_fn(struct btrfs_work *work)
> +static void end_workqueue_fn(struct work_struct *work)
>  {
>  	struct bio *bio;
>  	struct end_io_wq *end_io_wq;
> @@ -1987,22 +1986,22 @@ static noinline int next_root_backup(struct
btrfs_fs_info *info,
>  static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info)
>  {
>  	btrfs_stop_workers(&fs_info->generic_worker);
> -	btrfs_stop_workers(&fs_info->fixup_workers);
>  	btrfs_stop_workers(&fs_info->delalloc_workers);
>  	btrfs_stop_workers(&fs_info->workers);
> -	btrfs_stop_workers(&fs_info->endio_workers);
> -	btrfs_stop_workers(&fs_info->endio_meta_workers);
> -	btrfs_stop_workers(&fs_info->endio_raid56_workers);
> -	btrfs_stop_workers(&fs_info->rmw_workers);
> -	btrfs_stop_workers(&fs_info->endio_meta_write_workers);
> -	btrfs_stop_workers(&fs_info->endio_write_workers);
> -	btrfs_stop_workers(&fs_info->endio_freespace_worker);
>  	btrfs_stop_workers(&fs_info->submit_workers);
> -	btrfs_stop_workers(&fs_info->delayed_workers);
> -	btrfs_stop_workers(&fs_info->caching_workers);
> -	btrfs_stop_workers(&fs_info->readahead_workers);
> -	btrfs_stop_workers(&fs_info->flush_workers);
> -	btrfs_stop_workers(&fs_info->qgroup_rescan_workers);
> +	destroy_workqueue(fs_info->fixup_workers);
> +	destroy_workqueue(fs_info->endio_workers);
> +	destroy_workqueue(fs_info->endio_meta_workers);
> +	destroy_workqueue(fs_info->endio_raid56_workers);
> +	destroy_workqueue(fs_info->rmw_workers);
> +	destroy_workqueue(fs_info->endio_meta_write_workers);
> +	destroy_workqueue(fs_info->endio_write_workers);
> +	destroy_workqueue(fs_info->endio_freespace_worker);
> +	destroy_workqueue(fs_info->delayed_workers);
> +	destroy_workqueue(fs_info->caching_workers);
> +	destroy_workqueue(fs_info->readahead_workers);
> +	destroy_workqueue(fs_info->flush_workers);
> +	destroy_workqueue(fs_info->qgroup_rescan_workers);
>  }
>  
>  /* helper to cleanup tree roots */
> @@ -2099,6 +2098,8 @@ int open_ctree(struct super_block *sb,
>  	struct btrfs_root *quota_root;
>  	struct btrfs_root *log_tree_root;
>  	int ret;
> +	int max_active;
> +	int flags = WQ_UNBOUND | WQ_MEM_RECLAIM;
Have you tried that without WQ_UNBOUND?

IMO, kernel wq''s biggest benefit is its concurrency management by
hooking into the scheduler, but UNBOUND just disables it.

In my patch, I only set a few wq with WQ_UNBOUND.
>  	int err = -EINVAL;
>  	int num_backups_tried = 0;
>  	int backup_index = 0;
> @@ -2457,6 +2458,7 @@ int open_ctree(struct super_block *sb,
>  		goto fail_alloc;
>  	}
>  
> +	max_active = fs_info->thread_pool_size;
For btrfs wq, ''max_active'' is used as a maximum limit of
worker helpers,
while for kernel wq, ''max_active'' refers to at most how many
work items of the
wq can be executing at the same time per CPU.

I don''t think @thread_pool_size is properly used here.

thanks,
-liubo
>  	btrfs_init_workers(&fs_info->generic_worker,
>  			   "genwork", 1, NULL);
>  
> @@ -2468,23 +2470,13 @@ int open_ctree(struct super_block *sb,
>  			   fs_info->thread_pool_size,
>  			   &fs_info->generic_worker);
>  
> -	btrfs_init_workers(&fs_info->flush_workers,
"flush_delalloc",
> -			   fs_info->thread_pool_size,
> -			   &fs_info->generic_worker);
> -
> +	fs_info->flush_workers = alloc_workqueue("flush_delalloc",
flags,
> +						 max_active);
>  	btrfs_init_workers(&fs_info->submit_workers, "submit",
>  			   min_t(u64, fs_devices->num_devices,
>  			   fs_info->thread_pool_size),
>  			   &fs_info->generic_worker);
> -
> -	btrfs_init_workers(&fs_info->caching_workers, "cache",
> -			   2, &fs_info->generic_worker);
> -
> -	/* a higher idle thresh on the submit workers makes it much more
> -	 * likely that bios will be send down in a sane order to the
> -	 * devices
> -	 */
> -	fs_info->submit_workers.idle_thresh = 64;
> +	fs_info->caching_workers = alloc_workqueue("cache", flags,
2);
>  
>  	fs_info->workers.idle_thresh = 16;
>  	fs_info->workers.ordered = 1;
> @@ -2492,72 +2484,42 @@ int open_ctree(struct super_block *sb,
>  	fs_info->delalloc_workers.idle_thresh = 2;
>  	fs_info->delalloc_workers.ordered = 1;
>  
> -	btrfs_init_workers(&fs_info->fixup_workers, "fixup", 1,
> -			   &fs_info->generic_worker);
> -	btrfs_init_workers(&fs_info->endio_workers, "endio",
> -			   fs_info->thread_pool_size,
> -			   &fs_info->generic_worker);
> -	btrfs_init_workers(&fs_info->endio_meta_workers,
"endio-meta",
> -			   fs_info->thread_pool_size,
> -			   &fs_info->generic_worker);
> -	btrfs_init_workers(&fs_info->endio_meta_write_workers,
> -			   "endio-meta-write", fs_info->thread_pool_size,
> -			   &fs_info->generic_worker);
> -	btrfs_init_workers(&fs_info->endio_raid56_workers,
> -			   "endio-raid56", fs_info->thread_pool_size,
> -			   &fs_info->generic_worker);
> -	btrfs_init_workers(&fs_info->rmw_workers,
> -			   "rmw", fs_info->thread_pool_size,
> -			   &fs_info->generic_worker);
> -	btrfs_init_workers(&fs_info->endio_write_workers,
"endio-write",
> -			   fs_info->thread_pool_size,
> -			   &fs_info->generic_worker);
> -	btrfs_init_workers(&fs_info->endio_freespace_worker,
"freespace-write",
> -			   1, &fs_info->generic_worker);
> -	btrfs_init_workers(&fs_info->delayed_workers,
"delayed-meta",
> -			   fs_info->thread_pool_size,
> -			   &fs_info->generic_worker);
> -	btrfs_init_workers(&fs_info->readahead_workers,
"readahead",
> -			   fs_info->thread_pool_size,
> -			   &fs_info->generic_worker);
> -	btrfs_init_workers(&fs_info->qgroup_rescan_workers,
"qgroup-rescan", 1,
> -			   &fs_info->generic_worker);
> -
> -	/*
> -	 * endios are largely parallel and should have a very
> -	 * low idle thresh
> -	 */
> -	fs_info->endio_workers.idle_thresh = 4;
> -	fs_info->endio_meta_workers.idle_thresh = 4;
> -	fs_info->endio_raid56_workers.idle_thresh = 4;
> -	fs_info->rmw_workers.idle_thresh = 2;
> -
> -	fs_info->endio_write_workers.idle_thresh = 2;
> -	fs_info->endio_meta_write_workers.idle_thresh = 2;
> -	fs_info->readahead_workers.idle_thresh = 2;
> -
> +	fs_info->fixup_workers = alloc_workqueue("fixup", flags, 1);
> +	fs_info->endio_workers = alloc_workqueue("endio", flags,
max_active);
> +	fs_info->endio_meta_workers = alloc_workqueue("endio-meta",
flags,
> +						      max_active);
> +	fs_info->endio_meta_write_workers =
alloc_workqueue("endio-meta-write",
> +							    flags, max_active);
> +	fs_info->endio_raid56_workers =
alloc_workqueue("endio-raid56", flags,
> +							max_active);
> +	fs_info->rmw_workers = alloc_workqueue("rmw", flags,
max_active);
> +	fs_info->endio_write_workers =
alloc_workqueue("endio-write", flags,
> +						       max_active);
> +	fs_info->endio_freespace_worker =
alloc_workqueue("freespace-write",
> +							  flags, 1);
> +	fs_info->delayed_workers = alloc_workqueue("delayed_meta",
flags,
> +						   max_active);
> +	fs_info->readahead_workers = alloc_workqueue("readahead",
flags,
> +						     max_active);
> +	fs_info->qgroup_rescan_workers =
alloc_workqueue("group-rescan",
> +							 flags, 1);
>  	/*
>  	 * btrfs_start_workers can really only fail because of ENOMEM so just
>  	 * return -ENOMEM if any of these fail.
>  	 */
>  	ret = btrfs_start_workers(&fs_info->workers);
>  	ret |= btrfs_start_workers(&fs_info->generic_worker);
> -	ret |= btrfs_start_workers(&fs_info->submit_workers);
>  	ret |= btrfs_start_workers(&fs_info->delalloc_workers);
> -	ret |= btrfs_start_workers(&fs_info->fixup_workers);
> -	ret |= btrfs_start_workers(&fs_info->endio_workers);
> -	ret |= btrfs_start_workers(&fs_info->endio_meta_workers);
> -	ret |= btrfs_start_workers(&fs_info->rmw_workers);
> -	ret |= btrfs_start_workers(&fs_info->endio_raid56_workers);
> -	ret |= btrfs_start_workers(&fs_info->endio_meta_write_workers);
> -	ret |= btrfs_start_workers(&fs_info->endio_write_workers);
> -	ret |= btrfs_start_workers(&fs_info->endio_freespace_worker);
> -	ret |= btrfs_start_workers(&fs_info->delayed_workers);
> -	ret |= btrfs_start_workers(&fs_info->caching_workers);
> -	ret |= btrfs_start_workers(&fs_info->readahead_workers);
> -	ret |= btrfs_start_workers(&fs_info->flush_workers);
> -	ret |= btrfs_start_workers(&fs_info->qgroup_rescan_workers);
> -	if (ret) {
> +	ret |= btrfs_start_workers(&fs_info->submit_workers);
> +
> +	if (ret || !(fs_info->flush_workers &&
fs_info->endio_workers &&
> +		     fs_info->endio_meta_workers &&
> +		     fs_info->endio_raid56_workers &&
> +		     fs_info->rmw_workers &&
fs_info->qgroup_rescan_workers &&
> +		     fs_info->endio_meta_write_workers &&
> +		     fs_info->endio_write_workers &&
> +		     fs_info->caching_workers &&
fs_info->readahead_workers &&
> +		     fs_info->fixup_workers && fs_info->delayed_workers))
{
>  		err = -ENOMEM;
>  		goto fail_sb_buffer;
>  	}
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 0236de7..c8f67d9 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -377,7 +377,7 @@ static u64 add_new_free_space(struct
btrfs_block_group_cache *block_group,
>  	return total_added;
>  }
>  
> -static noinline void caching_thread(struct btrfs_work *work)
> +static noinline void caching_thread(struct work_struct *work)
>  {
>  	struct btrfs_block_group_cache *block_group;
>  	struct btrfs_fs_info *fs_info;
> @@ -530,7 +530,7 @@ static int cache_block_group(struct
btrfs_block_group_cache *cache,
>  	caching_ctl->block_group = cache;
>  	caching_ctl->progress = cache->key.objectid;
>  	atomic_set(&caching_ctl->count, 1);
> -	caching_ctl->work.func = caching_thread;
> +	INIT_WORK(&caching_ctl->work, caching_thread);
>  
>  	spin_lock(&cache->lock);
>  	/*
> @@ -621,7 +621,7 @@ static int cache_block_group(struct
btrfs_block_group_cache *cache,
>  
>  	btrfs_get_block_group(cache);
>  
> -	btrfs_queue_worker(&fs_info->caching_workers,
&caching_ctl->work);
> +	queue_work(fs_info->caching_workers, &caching_ctl->work);
>  
>  	return ret;
>  }
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index b7c2487..53901a5 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -1818,10 +1818,10 @@ int btrfs_set_extent_delalloc(struct inode *inode,
u64 start, u64 end,
>  /* see btrfs_writepage_start_hook for details on why this is required */
>  struct btrfs_writepage_fixup {
>  	struct page *page;
> -	struct btrfs_work work;
> +	struct work_struct work;
>  };
>  
> -static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
> +static void btrfs_writepage_fixup_worker(struct work_struct *work)
>  {
>  	struct btrfs_writepage_fixup *fixup;
>  	struct btrfs_ordered_extent *ordered;
> @@ -1912,9 +1912,9 @@ static int btrfs_writepage_start_hook(struct page
*page, u64 start, u64 end)
>  
>  	SetPageChecked(page);
>  	page_cache_get(page);
> -	fixup->work.func = btrfs_writepage_fixup_worker;
> +	INIT_WORK(&fixup->work, btrfs_writepage_fixup_worker);
>  	fixup->page = page;
> -	btrfs_queue_worker(&root->fs_info->fixup_workers,
&fixup->work);
> +	queue_work(root->fs_info->fixup_workers, &fixup->work);
>  	return -EBUSY;
>  }
>  
> @@ -2780,7 +2780,7 @@ out:
>  	return ret;
>  }
>  
> -static void finish_ordered_fn(struct btrfs_work *work)
> +static void finish_ordered_fn(struct work_struct *work)
>  {
>  	struct btrfs_ordered_extent *ordered_extent;
>  	ordered_extent = container_of(work, struct btrfs_ordered_extent, work);
> @@ -2793,7 +2793,7 @@ static int btrfs_writepage_end_io_hook(struct page
*page, u64 start, u64 end,
>  	struct inode *inode = page->mapping->host;
>  	struct btrfs_root *root = BTRFS_I(inode)->root;
>  	struct btrfs_ordered_extent *ordered_extent = NULL;
> -	struct btrfs_workers *workers;
> +	struct workqueue_struct *workers;
>  
>  	trace_btrfs_writepage_end_io_hook(page, start, end, uptodate);
>  
> @@ -2802,14 +2802,13 @@ static int btrfs_writepage_end_io_hook(struct page
*page, u64 start, u64 end,
>  					    end - start + 1, uptodate))
>  		return 0;
>  
> -	ordered_extent->work.func = finish_ordered_fn;
> -	ordered_extent->work.flags = 0;
> +	INIT_WORK(&ordered_extent->work, finish_ordered_fn);
>  
>  	if (btrfs_is_free_space_inode(inode))
> -		workers = &root->fs_info->endio_freespace_worker;
> +		workers = root->fs_info->endio_freespace_worker;
>  	else
> -		workers = &root->fs_info->endio_write_workers;
> -	btrfs_queue_worker(workers, &ordered_extent->work);
> +		workers = root->fs_info->endio_write_workers;
> +	queue_work(workers, &ordered_extent->work);
>  
>  	return 0;
>  }
> @@ -6906,10 +6905,9 @@ again:
>  	if (!ret)
>  		goto out_test;
>  
> -	ordered->work.func = finish_ordered_fn;
> -	ordered->work.flags = 0;
> -	btrfs_queue_worker(&root->fs_info->endio_write_workers,
> -			   &ordered->work);
> +	INIT_WORK(&ordered->work, finish_ordered_fn);
> +	queue_work(root->fs_info->endio_write_workers,
&ordered->work);
> +
>  out_test:
>  	/*
>  	 * our bio might span multiple ordered extents.  If we haven''t
> @@ -8187,7 +8185,7 @@ out_notrans:
>  	return ret;
>  }
>  
> -static void btrfs_run_delalloc_work(struct btrfs_work *work)
> +static void btrfs_run_delalloc_work(struct work_struct *work)
>  {
>  	struct btrfs_delalloc_work *delalloc_work;
>  
> @@ -8206,7 +8204,7 @@ static void btrfs_run_delalloc_work(struct btrfs_work
*work)
>  }
>  
>  struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode,
> -						    int wait, int delay_iput)
> +						      int wait, int delay_iput)
>  {
>  	struct btrfs_delalloc_work *work;
>  
> @@ -8219,8 +8217,7 @@ struct btrfs_delalloc_work
*btrfs_alloc_delalloc_work(struct inode *inode,
>  	work->inode = inode;
>  	work->wait = wait;
>  	work->delay_iput = delay_iput;
> -	work->work.func = btrfs_run_delalloc_work;
> -
> +	INIT_WORK(&work->work, btrfs_run_delalloc_work);
>  	return work;
>  }
>  
> @@ -8267,8 +8264,7 @@ static int __start_delalloc_inodes(struct btrfs_root
*root, int delay_iput)
>  			goto out;
>  		}
>  		list_add_tail(&work->list, &works);
> -		btrfs_queue_worker(&root->fs_info->flush_workers,
> -				   &work->work);
> +		queue_work(root->fs_info->flush_workers, &work->work);
>  
>  		cond_resched();
>  		spin_lock(&root->delalloc_lock);
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index 8136982..9b5ccac 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -552,7 +552,7 @@ void btrfs_remove_ordered_extent(struct inode *inode,
>  	wake_up(&entry->wait);
>  }
>  
> -static void btrfs_run_ordered_extent_work(struct btrfs_work *work)
> +static void btrfs_run_ordered_extent_work(struct work_struct *work)
>  {
>  	struct btrfs_ordered_extent *ordered;
>  
> @@ -594,10 +594,9 @@ void btrfs_wait_ordered_extents(struct btrfs_root
*root, int delay_iput)
>  		atomic_inc(&ordered->refs);
>  		spin_unlock(&root->ordered_extent_lock);
>  
> -		ordered->flush_work.func = btrfs_run_ordered_extent_work;
> +		INIT_WORK(&ordered->flush_work, btrfs_run_ordered_extent_work);
>  		list_add_tail(&ordered->work_list, &works);
> -		btrfs_queue_worker(&root->fs_info->flush_workers,
> -				   &ordered->flush_work);
> +		queue_work(root->fs_info->flush_workers,
&ordered->flush_work);
>  
>  		cond_resched();
>  		spin_lock(&root->ordered_extent_lock);
> @@ -706,8 +705,8 @@ int btrfs_run_ordered_operations(struct
btrfs_trans_handle *trans,
>  			goto out;
>  		}
>  		list_add_tail(&work->list, &works);
> -		btrfs_queue_worker(&root->fs_info->flush_workers,
> -				   &work->work);
> +		queue_work(root->fs_info->flush_workers,
> +			   &work->work);
>  
>  		cond_resched();
>  		spin_lock(&root->fs_info->ordered_root_lock);
> diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
> index 68844d5..f4c81d7 100644
> --- a/fs/btrfs/ordered-data.h
> +++ b/fs/btrfs/ordered-data.h
> @@ -123,10 +123,10 @@ struct btrfs_ordered_extent {
>  	/* a per root list of all the pending ordered extents */
>  	struct list_head root_extent_list;
>  
> -	struct btrfs_work work;
> +	struct work_struct work;
>  
>  	struct completion completion;
> -	struct btrfs_work flush_work;
> +	struct work_struct flush_work;
>  	struct list_head work_list;
>  };
>  
> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
> index 1280eff..a49fdfe 100644
> --- a/fs/btrfs/qgroup.c
> +++ b/fs/btrfs/qgroup.c
> @@ -1528,8 +1528,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle
*trans,
>  		ret = qgroup_rescan_init(fs_info, 0, 1);
>  		if (!ret) {
>  			qgroup_rescan_zero_tracking(fs_info);
> -			btrfs_queue_worker(&fs_info->qgroup_rescan_workers,
> -					   &fs_info->qgroup_rescan_work);
> +			queue_work(fs_info->qgroup_rescan_workers,
> +				   &fs_info->qgroup_rescan_work);
>  		}
>  		ret = 0;
>  	}
> @@ -1994,7 +1994,7 @@ out:
>  	return ret;
>  }
>  
> -static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
> +static void btrfs_qgroup_rescan_worker(struct work_struct *work)
>  {
>  	struct btrfs_fs_info *fs_info = container_of(work, struct btrfs_fs_info,
>  						     qgroup_rescan_work);
> @@ -2105,7 +2105,7 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info, u64
progress_objectid,
>  
>  	memset(&fs_info->qgroup_rescan_work, 0,
>  	       sizeof(fs_info->qgroup_rescan_work));
> -	fs_info->qgroup_rescan_work.func = btrfs_qgroup_rescan_worker;
> +	INIT_WORK(&fs_info->qgroup_rescan_work,
btrfs_qgroup_rescan_worker);
>  
>  	if (ret) {
>  err:
> @@ -2168,8 +2168,8 @@ btrfs_qgroup_rescan(struct btrfs_fs_info *fs_info)
>  
>  	qgroup_rescan_zero_tracking(fs_info);
>  
> -	btrfs_queue_worker(&fs_info->qgroup_rescan_workers,
> -			   &fs_info->qgroup_rescan_work);
> +	queue_work(fs_info->qgroup_rescan_workers,
> +		   &fs_info->qgroup_rescan_work);
>  
>  	return 0;
>  }
> @@ -2200,6 +2200,6 @@ void
>  btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>  {
>  	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
> -		btrfs_queue_worker(&fs_info->qgroup_rescan_workers,
> -				   &fs_info->qgroup_rescan_work);
> +		queue_work(fs_info->qgroup_rescan_workers,
> +			   &fs_info->qgroup_rescan_work);
>  }
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 0525e13..4b7769d 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -88,7 +88,7 @@ struct btrfs_raid_bio {
>  	/*
>  	 * for scheduling work in the helper threads
>  	 */
> -	struct btrfs_work work;
> +	struct work_struct work;
>  
>  	/*
>  	 * bio list and bio_list_lock are used
> @@ -167,8 +167,8 @@ struct btrfs_raid_bio {
>  
>  static int __raid56_parity_recover(struct btrfs_raid_bio *rbio);
>  static noinline void finish_rmw(struct btrfs_raid_bio *rbio);
> -static void rmw_work(struct btrfs_work *work);
> -static void read_rebuild_work(struct btrfs_work *work);
> +static void rmw_work(struct work_struct *work);
> +static void read_rebuild_work(struct work_struct *work);
>  static void async_rmw_stripe(struct btrfs_raid_bio *rbio);
>  static void async_read_rebuild(struct btrfs_raid_bio *rbio);
>  static int fail_bio_stripe(struct btrfs_raid_bio *rbio, struct bio *bio);
> @@ -1417,20 +1417,16 @@ cleanup:
>  
>  static void async_rmw_stripe(struct btrfs_raid_bio *rbio)
>  {
> -	rbio->work.flags = 0;
> -	rbio->work.func = rmw_work;
> -
> -	btrfs_queue_worker(&rbio->fs_info->rmw_workers,
> -			   &rbio->work);
> +	INIT_WORK(&rbio->work, rmw_work);
> +	queue_work(rbio->fs_info->rmw_workers,
> +		   &rbio->work);
>  }
>  
>  static void async_read_rebuild(struct btrfs_raid_bio *rbio)
>  {
> -	rbio->work.flags = 0;
> -	rbio->work.func = read_rebuild_work;
> -
> -	btrfs_queue_worker(&rbio->fs_info->rmw_workers,
> -			   &rbio->work);
> +	INIT_WORK(&rbio->work, read_rebuild_work);
> +	queue_work(rbio->fs_info->rmw_workers,
> +		   &rbio->work);
>  }
>  
>  /*
> @@ -1589,7 +1585,7 @@ struct btrfs_plug_cb {
>  	struct blk_plug_cb cb;
>  	struct btrfs_fs_info *info;
>  	struct list_head rbio_list;
> -	struct btrfs_work work;
> +	struct work_struct work;
>  };
>  
>  /*
> @@ -1653,7 +1649,7 @@ static void run_plug(struct btrfs_plug_cb *plug)
>   * if the unplug comes from schedule, we have to push the
>   * work off to a helper thread
>   */
> -static void unplug_work(struct btrfs_work *work)
> +static void unplug_work(struct work_struct *work)
>  {
>  	struct btrfs_plug_cb *plug;
>  	plug = container_of(work, struct btrfs_plug_cb, work);
> @@ -1666,10 +1662,9 @@ static void btrfs_raid_unplug(struct blk_plug_cb
*cb, bool from_schedule)
>  	plug = container_of(cb, struct btrfs_plug_cb, cb);
>  
>  	if (from_schedule) {
> -		plug->work.flags = 0;
> -		plug->work.func = unplug_work;
> -		btrfs_queue_worker(&plug->info->rmw_workers,
> -				   &plug->work);
> +		INIT_WORK(&plug->work, unplug_work);
> +		queue_work(plug->info->rmw_workers,
> +			   &plug->work);
>  		return;
>  	}
>  	run_plug(plug);
> @@ -2083,7 +2078,7 @@ int raid56_parity_recover(struct btrfs_root *root,
struct bio *bio,
>  
>  }
>  
> -static void rmw_work(struct btrfs_work *work)
> +static void rmw_work(struct work_struct *work)
>  {
>  	struct btrfs_raid_bio *rbio;
>  
> @@ -2091,7 +2086,7 @@ static void rmw_work(struct btrfs_work *work)
>  	raid56_rmw_stripe(rbio);
>  }
>  
> -static void read_rebuild_work(struct btrfs_work *work)
> +static void read_rebuild_work(struct work_struct *work)
>  {
>  	struct btrfs_raid_bio *rbio;
>  
> diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
> index 1031b69..9607648 100644
> --- a/fs/btrfs/reada.c
> +++ b/fs/btrfs/reada.c
> @@ -91,7 +91,7 @@ struct reada_zone {
>  };
>  
>  struct reada_machine_work {
> -	struct btrfs_work	work;
> +	struct work_struct	work;
>  	struct btrfs_fs_info	*fs_info;
>  };
>  
> @@ -732,7 +732,7 @@ static int reada_start_machine_dev(struct btrfs_fs_info
*fs_info,
>  
>  }
>  
> -static void reada_start_machine_worker(struct btrfs_work *work)
> +static void reada_start_machine_worker(struct work_struct *work)
>  {
>  	struct reada_machine_work *rmw;
>  	struct btrfs_fs_info *fs_info;
> @@ -792,10 +792,10 @@ static void reada_start_machine(struct btrfs_fs_info
*fs_info)
>  		/* FIXME we cannot handle this properly right now */
>  		BUG();
>  	}
> -	rmw->work.func = reada_start_machine_worker;
> +	INIT_WORK(&rmw->work, reada_start_machine_worker);
>  	rmw->fs_info = fs_info;
>  
> -	btrfs_queue_worker(&fs_info->readahead_workers,
&rmw->work);
> +	queue_work(fs_info->readahead_workers, &rmw->work);
>  }
>  
>  #ifdef DEBUG
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 4ba2a69..025bb53 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -96,7 +96,7 @@ struct scrub_bio {
>  #endif
>  	int			page_count;
>  	int			next_free;
> -	struct btrfs_work	work;
> +	struct work_struct	work;
>  };
>  
>  struct scrub_block {
> @@ -154,7 +154,7 @@ struct scrub_fixup_nodatasum {
>  	struct btrfs_device	*dev;
>  	u64			logical;
>  	struct btrfs_root	*root;
> -	struct btrfs_work	work;
> +	struct work_struct	work;
>  	int			mirror_num;
>  };
>  
> @@ -164,7 +164,7 @@ struct scrub_copy_nocow_ctx {
>  	u64			len;
>  	int			mirror_num;
>  	u64			physical_for_dev_replace;
> -	struct btrfs_work	work;
> +	struct work_struct	work;
>  };
>  
>  struct scrub_warning {
> @@ -224,7 +224,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64
logical, u64 len,
>  		       u64 gen, int mirror_num, u8 *csum, int force,
>  		       u64 physical_for_dev_replace);
>  static void scrub_bio_end_io(struct bio *bio, int err);
> -static void scrub_bio_end_io_worker(struct btrfs_work *work);
> +static void scrub_bio_end_io_worker(struct work_struct *work);
>  static void scrub_block_complete(struct scrub_block *sblock);
>  static void scrub_remap_extent(struct btrfs_fs_info *fs_info,
>  			       u64 extent_logical, u64 extent_len,
> @@ -241,14 +241,14 @@ static int scrub_add_page_to_wr_bio(struct scrub_ctx
*sctx,
>  				    struct scrub_page *spage);
>  static void scrub_wr_submit(struct scrub_ctx *sctx);
>  static void scrub_wr_bio_end_io(struct bio *bio, int err);
> -static void scrub_wr_bio_end_io_worker(struct btrfs_work *work);
> +static void scrub_wr_bio_end_io_worker(struct work_struct *work);
>  static int write_page_nocow(struct scrub_ctx *sctx,
>  			    u64 physical_for_dev_replace, struct page *page);
>  static int copy_nocow_pages_for_inode(u64 inum, u64 offset, u64 root,
>  				      void *ctx);
>  static int copy_nocow_pages(struct scrub_ctx *sctx, u64 logical, u64 len,
>  			    int mirror_num, u64 physical_for_dev_replace);
> -static void copy_nocow_pages_worker(struct btrfs_work *work);
> +static void copy_nocow_pages_worker(struct work_struct *work);
>  
>  
>  static void scrub_pending_bio_inc(struct scrub_ctx *sctx)
> @@ -386,7 +386,7 @@ struct scrub_ctx *scrub_setup_ctx(struct btrfs_device
*dev, int is_dev_replace)
>  		sbio->index = i;
>  		sbio->sctx = sctx;
>  		sbio->page_count = 0;
> -		sbio->work.func = scrub_bio_end_io_worker;
> +		INIT_WORK(&sbio->work, scrub_bio_end_io_worker);
>  
>  		if (i != SCRUB_BIOS_PER_SCTX - 1)
>  			sctx->bios[i]->next_free = i + 1;
> @@ -691,7 +691,7 @@ out:
>  	return -EIO;
>  }
>  
> -static void scrub_fixup_nodatasum(struct btrfs_work *work)
> +static void scrub_fixup_nodatasum(struct work_struct *work)
>  {
>  	int ret;
>  	struct scrub_fixup_nodatasum *fixup;
> @@ -956,9 +956,8 @@ nodatasum_case:
>  		fixup_nodatasum->root = fs_info->extent_root;
>  		fixup_nodatasum->mirror_num = failed_mirror_index + 1;
>  		scrub_pending_trans_workers_inc(sctx);
> -		fixup_nodatasum->work.func = scrub_fixup_nodatasum;
> -		btrfs_queue_worker(&fs_info->scrub_workers,
> -				   &fixup_nodatasum->work);
> +		INIT_WORK(&fixup_nodatasum->work, scrub_fixup_nodatasum);
> +		queue_work(fs_info->scrub_workers, &fixup_nodatasum->work);
>  		goto out;
>  	}
>  
> @@ -1592,11 +1591,11 @@ static void scrub_wr_bio_end_io(struct bio *bio,
int err)
>  	sbio->err = err;
>  	sbio->bio = bio;
>  
> -	sbio->work.func = scrub_wr_bio_end_io_worker;
> -	btrfs_queue_worker(&fs_info->scrub_wr_completion_workers,
&sbio->work);
> +	INIT_WORK(&sbio->work, scrub_wr_bio_end_io_worker);
> +	queue_work(fs_info->scrub_wr_completion_workers, &sbio->work);
>  }
>  
> -static void scrub_wr_bio_end_io_worker(struct btrfs_work *work)
> +static void scrub_wr_bio_end_io_worker(struct work_struct *work)
>  {
>  	struct scrub_bio *sbio = container_of(work, struct scrub_bio, work);
>  	struct scrub_ctx *sctx = sbio->sctx;
> @@ -2061,10 +2060,10 @@ static void scrub_bio_end_io(struct bio *bio, int
err)
>  	sbio->err = err;
>  	sbio->bio = bio;
>  
> -	btrfs_queue_worker(&fs_info->scrub_workers, &sbio->work);
> +	queue_work(fs_info->scrub_workers, &sbio->work);
>  }
>  
> -static void scrub_bio_end_io_worker(struct btrfs_work *work)
> +static void scrub_bio_end_io_worker(struct work_struct *work)
>  {
>  	struct scrub_bio *sbio = container_of(work, struct scrub_bio, work);
>  	struct scrub_ctx *sctx = sbio->sctx;
> @@ -2778,34 +2777,33 @@ static noinline_for_stack int
scrub_workers_get(struct btrfs_fs_info *fs_info,
>  						int is_dev_replace)
>  {
>  	int ret = 0;
> +	int flags = WQ_UNBOUND | WQ_MEM_RECLAIM;
> +	int max_active = fs_info->thread_pool_size;
>  
>  	mutex_lock(&fs_info->scrub_lock);
>  	if (fs_info->scrub_workers_refcnt == 0) {
>  		if (is_dev_replace)
> -			btrfs_init_workers(&fs_info->scrub_workers, "scrub",
1,
> -					&fs_info->generic_worker);
> +			fs_info->scrub_workers > +				alloc_workqueue("scrub",
flags, 1);
>  		else
> -			btrfs_init_workers(&fs_info->scrub_workers, "scrub",
> -					fs_info->thread_pool_size,
> -					&fs_info->generic_worker);
> -		fs_info->scrub_workers.idle_thresh = 4;
> -		ret = btrfs_start_workers(&fs_info->scrub_workers);
> -		if (ret)
> +			fs_info->scrub_workers > +				alloc_workqueue("scrub",
flags, max_active);
> +		if (!fs_info->scrub_workers) {
> +			ret = -ENOMEM;
>  			goto out;
> -		btrfs_init_workers(&fs_info->scrub_wr_completion_workers,
> -				   "scrubwrc",
> -				   fs_info->thread_pool_size,
> -				   &fs_info->generic_worker);
> -		fs_info->scrub_wr_completion_workers.idle_thresh = 2;
> -		ret = btrfs_start_workers(
> -				&fs_info->scrub_wr_completion_workers);
> -		if (ret)
> +		}
> +		fs_info->scrub_wr_completion_workers > +		
alloc_workqueue("scrubwrc", flags, max_active);
> +		if (!fs_info->scrub_wr_completion_workers) {
> +			ret = -ENOMEM;
>  			goto out;
> -		btrfs_init_workers(&fs_info->scrub_nocow_workers,
"scrubnc", 1,
> -				   &fs_info->generic_worker);
> -		ret = btrfs_start_workers(&fs_info->scrub_nocow_workers);
> -		if (ret)
> +		}
> +		fs_info->scrub_nocow_workers > +		
alloc_workqueue("scrubnc", flags, 1);
> +		if (!fs_info->scrub_nocow_workers) {
> +			ret = -ENOMEM;
>  			goto out;
> +		}
>  	}
>  	++fs_info->scrub_workers_refcnt;
>  out:
> @@ -2818,9 +2816,9 @@ static noinline_for_stack void
scrub_workers_put(struct btrfs_fs_info *fs_info)
>  {
>  	mutex_lock(&fs_info->scrub_lock);
>  	if (--fs_info->scrub_workers_refcnt == 0) {
> -		btrfs_stop_workers(&fs_info->scrub_workers);
> -		btrfs_stop_workers(&fs_info->scrub_wr_completion_workers);
> -		btrfs_stop_workers(&fs_info->scrub_nocow_workers);
> +		destroy_workqueue(fs_info->scrub_workers);
> +		destroy_workqueue(fs_info->scrub_wr_completion_workers);
> +		destroy_workqueue(fs_info->scrub_nocow_workers);
>  	}
>  	WARN_ON(fs_info->scrub_workers_refcnt < 0);
>  	mutex_unlock(&fs_info->scrub_lock);
> @@ -3130,14 +3128,14 @@ static int copy_nocow_pages(struct scrub_ctx *sctx,
u64 logical, u64 len,
>  	nocow_ctx->len = len;
>  	nocow_ctx->mirror_num = mirror_num;
>  	nocow_ctx->physical_for_dev_replace = physical_for_dev_replace;
> -	nocow_ctx->work.func = copy_nocow_pages_worker;
> -	btrfs_queue_worker(&fs_info->scrub_nocow_workers,
> -			   &nocow_ctx->work);
> +	INIT_WORK(&nocow_ctx->work, copy_nocow_pages_worker);
> +	queue_work(fs_info->scrub_nocow_workers,
> +		   &nocow_ctx->work);
>  
>  	return 0;
>  }
>  
> -static void copy_nocow_pages_worker(struct btrfs_work *work)
> +static void copy_nocow_pages_worker(struct work_struct *work)
>  {
>  	struct scrub_copy_nocow_ctx *nocow_ctx >  		container_of(work, struct
scrub_copy_nocow_ctx, work);
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 8eb6191..f557ab6 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -1177,16 +1177,19 @@ static void btrfs_resize_thread_pool(struct
btrfs_fs_info *fs_info,
>  	btrfs_set_max_workers(&fs_info->workers, new_pool_size);
>  	btrfs_set_max_workers(&fs_info->delalloc_workers, new_pool_size);
>  	btrfs_set_max_workers(&fs_info->submit_workers, new_pool_size);
> -	btrfs_set_max_workers(&fs_info->caching_workers, new_pool_size);
> -	btrfs_set_max_workers(&fs_info->fixup_workers, new_pool_size);
> -	btrfs_set_max_workers(&fs_info->endio_workers, new_pool_size);
> -	btrfs_set_max_workers(&fs_info->endio_meta_workers,
new_pool_size);
> -	btrfs_set_max_workers(&fs_info->endio_meta_write_workers,
new_pool_size);
> -	btrfs_set_max_workers(&fs_info->endio_write_workers,
new_pool_size);
> -	btrfs_set_max_workers(&fs_info->endio_freespace_worker,
new_pool_size);
> -	btrfs_set_max_workers(&fs_info->delayed_workers, new_pool_size);
> -	btrfs_set_max_workers(&fs_info->readahead_workers, new_pool_size);
> -	btrfs_set_max_workers(&fs_info->scrub_wr_completion_workers,
> +	workqueue_set_max_active(fs_info->caching_workers, new_pool_size);
> +	workqueue_set_max_active(fs_info->fixup_workers, new_pool_size);
> +	workqueue_set_max_active(fs_info->endio_workers, new_pool_size);
> +	workqueue_set_max_active(fs_info->endio_meta_workers, new_pool_size);
> +	workqueue_set_max_active(fs_info->endio_meta_write_workers,
> +				 new_pool_size);
> +	workqueue_set_max_active(fs_info->endio_write_workers,
> +				 new_pool_size);
> +	workqueue_set_max_active(fs_info->endio_freespace_worker,
> +				 new_pool_size);
> +	workqueue_set_max_active(fs_info->delayed_workers, new_pool_size);
> +	workqueue_set_max_active(fs_info->readahead_workers, new_pool_size);
> +	workqueue_set_max_active(fs_info->scrub_wr_completion_workers,
>  			      new_pool_size);
>  }
>  
> -- 
> 1.8.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo

2013-Sep-13 01:45 UTC

head link

Re: [PATCH v2 2/9] btrfs: use kernel workqueue to replace the btrfs_workers functions

于 2013年09月13日 09:29, Liu Bo 写道:> On Thu, Sep 12, 2013 at 04:08:17PM +0800, Qu Wenruo wrote:
>> Use the kernel workqueue to replace the btrfs_workers which are only
>> used as normal workqueue.
>>
>> Other btrfs_workers will use some extra functions like requeue, high
>> priority and ordered work.
>> These btrfs_workers will not be touched in this patch.
>>
>> The followings are the untouched btrfs_workers:
>>
>> generic_worker:		As the helper for other btrfs_workers
>> workers:		Use the ordering and high priority features
>> delalloc_workers: 	Use the ordering feature
>> submit_workers:		Use requeue feature
>>
>> All other workers can be replaced using the kernel workqueue directly.
> Interesting, I''ve been doing the same work for a while, but
I''m still
> doing the tuning work on kerner wq + btrfs.
>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>>   fs/btrfs/ctree.h         |  39 +++++------
>>   fs/btrfs/delayed-inode.c |   9 ++-
>>   fs/btrfs/disk-io.c       | 164
++++++++++++++++++-----------------------------
>>   fs/btrfs/extent-tree.c   |   6 +-
>>   fs/btrfs/inode.c         |  38 +++++------
>>   fs/btrfs/ordered-data.c  |  11 ++--
>>   fs/btrfs/ordered-data.h  |   4 +-
>>   fs/btrfs/qgroup.c        |  16 ++---
>>   fs/btrfs/raid56.c        |  37 +++++------
>>   fs/btrfs/reada.c         |   8 +--
>>   fs/btrfs/scrub.c         |  84 ++++++++++++------------
>>   fs/btrfs/super.c         |  23 ++++---
>>   12 files changed, 196 insertions(+), 243 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index e795bf1..0dd6ec9 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1202,7 +1202,7 @@ struct btrfs_caching_control {
>>   	struct list_head list;
>>   	struct mutex mutex;
>>   	wait_queue_head_t wait;
>> -	struct btrfs_work work;
>> +	struct work_struct work;
>>   	struct btrfs_block_group_cache *block_group;
>>   	u64 progress;
>>   	atomic_t count;
>> @@ -1479,25 +1479,26 @@ struct btrfs_fs_info {
>>   	struct btrfs_workers generic_worker;
>>   	struct btrfs_workers workers;
>>   	struct btrfs_workers delalloc_workers;
>> -	struct btrfs_workers flush_workers;
>> -	struct btrfs_workers endio_workers;
>> -	struct btrfs_workers endio_meta_workers;
>> -	struct btrfs_workers endio_raid56_workers;
>> -	struct btrfs_workers rmw_workers;
>> -	struct btrfs_workers endio_meta_write_workers;
>> -	struct btrfs_workers endio_write_workers;
>> -	struct btrfs_workers endio_freespace_worker;
>>   	struct btrfs_workers submit_workers;
>> -	struct btrfs_workers caching_workers;
>> -	struct btrfs_workers readahead_workers;
>> +
>> +	struct workqueue_struct *flush_workers;
>> +	struct workqueue_struct *endio_workers;
>> +	struct workqueue_struct *endio_meta_workers;
>> +	struct workqueue_struct *endio_raid56_workers;
>> +	struct workqueue_struct *rmw_workers;
>> +	struct workqueue_struct *endio_meta_write_workers;
>> +	struct workqueue_struct *endio_write_workers;
>> +	struct workqueue_struct *endio_freespace_worker;
>> +	struct workqueue_struct *caching_workers;
>> +	struct workqueue_struct *readahead_workers;
>>   
>>   	/*
>>   	 * fixup workers take dirty pages that didn''t properly go
through
>>   	 * the cow mechanism and make them safe to write.  It happens
>>   	 * for the sys_munmap function call path
>>   	 */
>> -	struct btrfs_workers fixup_workers;
>> -	struct btrfs_workers delayed_workers;
>> +	struct workqueue_struct *fixup_workers;
>> +	struct workqueue_struct *delayed_workers;
>>   	struct task_struct *transaction_kthread;
>>   	struct task_struct *cleaner_kthread;
>>   	int thread_pool_size;
>> @@ -1576,9 +1577,9 @@ struct btrfs_fs_info {
>>   	wait_queue_head_t scrub_pause_wait;
>>   	struct rw_semaphore scrub_super_lock;
>>   	int scrub_workers_refcnt;
>> -	struct btrfs_workers scrub_workers;
>> -	struct btrfs_workers scrub_wr_completion_workers;
>> -	struct btrfs_workers scrub_nocow_workers;
>> +	struct workqueue_struct *scrub_workers;
>> +	struct workqueue_struct *scrub_wr_completion_workers;
>> +	struct workqueue_struct *scrub_nocow_workers;
>>   
>>   #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
>>   	u32 check_integrity_print_mask;
>> @@ -1619,9 +1620,9 @@ struct btrfs_fs_info {
>>   	/* qgroup rescan items */
>>   	struct mutex qgroup_rescan_lock; /* protects the progress item */
>>   	struct btrfs_key qgroup_rescan_progress;
>> -	struct btrfs_workers qgroup_rescan_workers;
>> +	struct workqueue_struct *qgroup_rescan_workers;
>>   	struct completion qgroup_rescan_completion;
>> -	struct btrfs_work qgroup_rescan_work;
>> +	struct work_struct qgroup_rescan_work;
>>   
>>   	/* filesystem state */
>>   	unsigned long fs_state;
>> @@ -3542,7 +3543,7 @@ struct btrfs_delalloc_work {
>>   	int delay_iput;
>>   	struct completion completion;
>>   	struct list_head list;
>> -	struct btrfs_work work;
>> +	struct work_struct work;
>>   };
>>   
>>   struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode
*inode,
>> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
>> index 5615eac..2b8da0a7 100644
>> --- a/fs/btrfs/delayed-inode.c
>> +++ b/fs/btrfs/delayed-inode.c
>> @@ -1258,10 +1258,10 @@ void btrfs_remove_delayed_node(struct inode
*inode)
>>   struct btrfs_async_delayed_work {
>>   	struct btrfs_delayed_root *delayed_root;
>>   	int nr;
>> -	struct btrfs_work work;
>> +	struct work_struct work;
>>   };
>>   
>> -static void btrfs_async_run_delayed_root(struct btrfs_work *work)
>> +static void btrfs_async_run_delayed_root(struct work_struct *work)
>>   {
>>   	struct btrfs_async_delayed_work *async_work;
>>   	struct btrfs_delayed_root *delayed_root;
>> @@ -1359,11 +1359,10 @@ static int btrfs_wq_run_delayed_node(struct
btrfs_delayed_root *delayed_root,
>>   		return -ENOMEM;
>>   
>>   	async_work->delayed_root = delayed_root;
>> -	async_work->work.func = btrfs_async_run_delayed_root;
>> -	async_work->work.flags = 0;
>> +	INIT_WORK(&async_work->work, btrfs_async_run_delayed_root);
>>   	async_work->nr = nr;
>>   
>> -	btrfs_queue_worker(&root->fs_info->delayed_workers,
&async_work->work);
>> +	queue_work(root->fs_info->delayed_workers,
&async_work->work);
>>   	return 0;
>>   }
>>   
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 3c2886c..d02a552 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -54,7 +54,7 @@
>>   #endif
>>   
>>   static struct extent_io_ops btree_extent_io_ops;
>> -static void end_workqueue_fn(struct btrfs_work *work);
>> +static void end_workqueue_fn(struct work_struct *work);
>>   static void free_fs_root(struct btrfs_root *root);
>>   static int btrfs_check_super_valid(struct btrfs_fs_info *fs_info,
>>   				    int read_only);
>> @@ -86,7 +86,7 @@ struct end_io_wq {
>>   	int error;
>>   	int metadata;
>>   	struct list_head list;
>> -	struct btrfs_work work;
>> +	struct work_struct work;
>>   };
>>   
>>   /*
>> @@ -692,31 +692,30 @@ static void end_workqueue_bio(struct bio *bio,
int err)
>>   
>>   	fs_info = end_io_wq->info;
>>   	end_io_wq->error = err;
>> -	end_io_wq->work.func = end_workqueue_fn;
>> -	end_io_wq->work.flags = 0;
>> +	INIT_WORK(&end_io_wq->work, end_workqueue_fn);
>>   
>>   	if (bio->bi_rw & REQ_WRITE) {
>>   		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_METADATA)
>> -			btrfs_queue_worker(&fs_info->endio_meta_write_workers,
>> -					   &end_io_wq->work);
>> +			queue_work(fs_info->endio_meta_write_workers,
>> +				   &end_io_wq->work);
>>   		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_FREE_SPACE)
>> -			btrfs_queue_worker(&fs_info->endio_freespace_worker,
>> -					   &end_io_wq->work);
>> +			queue_work(fs_info->endio_freespace_worker,
>> +				   &end_io_wq->work);
>>   		else if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
>> -			btrfs_queue_worker(&fs_info->endio_raid56_workers,
>> -					   &end_io_wq->work);
>> +			queue_work(fs_info->endio_raid56_workers,
>> +				   &end_io_wq->work);
>>   		else
>> -			btrfs_queue_worker(&fs_info->endio_write_workers,
>> -					   &end_io_wq->work);
>> +			queue_work(fs_info->endio_write_workers,
>> +				   &end_io_wq->work);
>>   	} else {
>>   		if (end_io_wq->metadata == BTRFS_WQ_ENDIO_RAID56)
>> -			btrfs_queue_worker(&fs_info->endio_raid56_workers,
>> +			queue_work(fs_info->endio_raid56_workers,
>>   					   &end_io_wq->work);
>>   		else if (end_io_wq->metadata)
>> -			btrfs_queue_worker(&fs_info->endio_meta_workers,
>> +			queue_work(fs_info->endio_meta_workers,
>>   					   &end_io_wq->work);
>>   		else
>> -			btrfs_queue_worker(&fs_info->endio_workers,
>> +			queue_work(fs_info->endio_workers,
>>   					   &end_io_wq->work);
>>   	}
>>   }
>> @@ -1662,7 +1661,7 @@ static int setup_bdi(struct btrfs_fs_info *info,
struct backing_dev_info *bdi)
>>    * called by the kthread helper functions to finally call the bio
end_io
>>    * functions.  This is where read checksum verification actually
happens
>>    */
>> -static void end_workqueue_fn(struct btrfs_work *work)
>> +static void end_workqueue_fn(struct work_struct *work)
>>   {
>>   	struct bio *bio;
>>   	struct end_io_wq *end_io_wq;
>> @@ -1987,22 +1986,22 @@ static noinline int next_root_backup(struct
btrfs_fs_info *info,
>>   static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info)
>>   {
>>   	btrfs_stop_workers(&fs_info->generic_worker);
>> -	btrfs_stop_workers(&fs_info->fixup_workers);
>>   	btrfs_stop_workers(&fs_info->delalloc_workers);
>>   	btrfs_stop_workers(&fs_info->workers);
>> -	btrfs_stop_workers(&fs_info->endio_workers);
>> -	btrfs_stop_workers(&fs_info->endio_meta_workers);
>> -	btrfs_stop_workers(&fs_info->endio_raid56_workers);
>> -	btrfs_stop_workers(&fs_info->rmw_workers);
>> -	btrfs_stop_workers(&fs_info->endio_meta_write_workers);
>> -	btrfs_stop_workers(&fs_info->endio_write_workers);
>> -	btrfs_stop_workers(&fs_info->endio_freespace_worker);
>>   	btrfs_stop_workers(&fs_info->submit_workers);
>> -	btrfs_stop_workers(&fs_info->delayed_workers);
>> -	btrfs_stop_workers(&fs_info->caching_workers);
>> -	btrfs_stop_workers(&fs_info->readahead_workers);
>> -	btrfs_stop_workers(&fs_info->flush_workers);
>> -	btrfs_stop_workers(&fs_info->qgroup_rescan_workers);
>> +	destroy_workqueue(fs_info->fixup_workers);
>> +	destroy_workqueue(fs_info->endio_workers);
>> +	destroy_workqueue(fs_info->endio_meta_workers);
>> +	destroy_workqueue(fs_info->endio_raid56_workers);
>> +	destroy_workqueue(fs_info->rmw_workers);
>> +	destroy_workqueue(fs_info->endio_meta_write_workers);
>> +	destroy_workqueue(fs_info->endio_write_workers);
>> +	destroy_workqueue(fs_info->endio_freespace_worker);
>> +	destroy_workqueue(fs_info->delayed_workers);
>> +	destroy_workqueue(fs_info->caching_workers);
>> +	destroy_workqueue(fs_info->readahead_workers);
>> +	destroy_workqueue(fs_info->flush_workers);
>> +	destroy_workqueue(fs_info->qgroup_rescan_workers);
>>   }
>>   
>>   /* helper to cleanup tree roots */
>> @@ -2099,6 +2098,8 @@ int open_ctree(struct super_block *sb,
>>   	struct btrfs_root *quota_root;
>>   	struct btrfs_root *log_tree_root;
>>   	int ret;
>> +	int max_active;
>> +	int flags = WQ_UNBOUND | WQ_MEM_RECLAIM;
> Have you tried that without WQ_UNBOUND?
>
> IMO, kernel wq''s biggest benefit is its concurrency management by
> hooking into the scheduler, but UNBOUND just disables it.
>
> In my patch, I only set a few wq with WQ_UNBOUND.Yes, already tried.
(replace all wq in *THIS* patch with bounded wq)

But the performance drops dramatically.
(overall 15%, some tests halved the performance)

The original btrfs workqueue tries to spread work to all CPUs,
so unbound workqueue is a better choice.
Also the works already deal the concurrency control by themselves,
the kernel concurrency seems unneeded.
>
>>   	int err = -EINVAL;
>>   	int num_backups_tried = 0;
>>   	int backup_index = 0;
>> @@ -2457,6 +2458,7 @@ int open_ctree(struct super_block *sb,
>>   		goto fail_alloc;
>>   	}
>>   
>> +	max_active = fs_info->thread_pool_size;
> For btrfs wq, ''max_active'' is used as a maximum limit of
worker helpers,
> while for kernel wq, ''max_active'' refers to at most how
many work items of the
> wq can be executing at the same time per CPU.
>
> I don''t think @thread_pool_size is properly used here.Yes that''s right.
So the last patch(9/9) changes the thread_pool_size algorithm,
which will use the default value(0) as the thread_pool_size.
(submit_limit algorithm also changed not to return 0)

Also, in my performance test, even on a dual core system with the
original thread_pool size(min(num of CPU +2,8)=4), the performance only 
drops 5%.
>
> thanks,
> -liubo
>
>>   	btrfs_init_workers(&fs_info->generic_worker,
>>   			   "genwork", 1, NULL);
>>   
>> @@ -2468,23 +2470,13 @@ int open_ctree(struct super_block *sb,
>>   			   fs_info->thread_pool_size,
>>   			   &fs_info->generic_worker);
>>   
>> -	btrfs_init_workers(&fs_info->flush_workers,
"flush_delalloc",
>> -			   fs_info->thread_pool_size,
>> -			   &fs_info->generic_worker);
>> -
>> +	fs_info->flush_workers =
alloc_workqueue("flush_delalloc", flags,
>> +						 max_active);
>>   	btrfs_init_workers(&fs_info->submit_workers,
"submit",
>>   			   min_t(u64, fs_devices->num_devices,
>>   			   fs_info->thread_pool_size),
>>   			   &fs_info->generic_worker);
>> -
>> -	btrfs_init_workers(&fs_info->caching_workers,
"cache",
>> -			   2, &fs_info->generic_worker);
>> -
>> -	/* a higher idle thresh on the submit workers makes it much more
>> -	 * likely that bios will be send down in a sane order to the
>> -	 * devices
>> -	 */
>> -	fs_info->submit_workers.idle_thresh = 64;
>> +	fs_info->caching_workers = alloc_workqueue("cache",
flags, 2);
>>   
>>   	fs_info->workers.idle_thresh = 16;
>>   	fs_info->workers.ordered = 1;
>> @@ -2492,72 +2484,42 @@ int open_ctree(struct super_block *sb,
>>   	fs_info->delalloc_workers.idle_thresh = 2;
>>   	fs_info->delalloc_workers.ordered = 1;
>>   
>> -	btrfs_init_workers(&fs_info->fixup_workers, "fixup",
1,
>> -			   &fs_info->generic_worker);
>> -	btrfs_init_workers(&fs_info->endio_workers, "endio",
>> -			   fs_info->thread_pool_size,
>> -			   &fs_info->generic_worker);
>> -	btrfs_init_workers(&fs_info->endio_meta_workers,
"endio-meta",
>> -			   fs_info->thread_pool_size,
>> -			   &fs_info->generic_worker);
>> -	btrfs_init_workers(&fs_info->endio_meta_write_workers,
>> -			   "endio-meta-write", fs_info->thread_pool_size,
>> -			   &fs_info->generic_worker);
>> -	btrfs_init_workers(&fs_info->endio_raid56_workers,
>> -			   "endio-raid56", fs_info->thread_pool_size,
>> -			   &fs_info->generic_worker);
>> -	btrfs_init_workers(&fs_info->rmw_workers,
>> -			   "rmw", fs_info->thread_pool_size,
>> -			   &fs_info->generic_worker);
>> -	btrfs_init_workers(&fs_info->endio_write_workers,
"endio-write",
>> -			   fs_info->thread_pool_size,
>> -			   &fs_info->generic_worker);
>> -	btrfs_init_workers(&fs_info->endio_freespace_worker,
"freespace-write",
>> -			   1, &fs_info->generic_worker);
>> -	btrfs_init_workers(&fs_info->delayed_workers,
"delayed-meta",
>> -			   fs_info->thread_pool_size,
>> -			   &fs_info->generic_worker);
>> -	btrfs_init_workers(&fs_info->readahead_workers,
"readahead",
>> -			   fs_info->thread_pool_size,
>> -			   &fs_info->generic_worker);
>> -	btrfs_init_workers(&fs_info->qgroup_rescan_workers,
"qgroup-rescan", 1,
>> -			   &fs_info->generic_worker);
>> -
>> -	/*
>> -	 * endios are largely parallel and should have a very
>> -	 * low idle thresh
>> -	 */
>> -	fs_info->endio_workers.idle_thresh = 4;
>> -	fs_info->endio_meta_workers.idle_thresh = 4;
>> -	fs_info->endio_raid56_workers.idle_thresh = 4;
>> -	fs_info->rmw_workers.idle_thresh = 2;
>> -
>> -	fs_info->endio_write_workers.idle_thresh = 2;
>> -	fs_info->endio_meta_write_workers.idle_thresh = 2;
>> -	fs_info->readahead_workers.idle_thresh = 2;
>> -
>> +	fs_info->fixup_workers = alloc_workqueue("fixup", flags,
1);
>> +	fs_info->endio_workers = alloc_workqueue("endio", flags,
max_active);
>> +	fs_info->endio_meta_workers =
alloc_workqueue("endio-meta", flags,
>> +						      max_active);
>> +	fs_info->endio_meta_write_workers =
alloc_workqueue("endio-meta-write",
>> +							    flags, max_active);
>> +	fs_info->endio_raid56_workers =
alloc_workqueue("endio-raid56", flags,
>> +							max_active);
>> +	fs_info->rmw_workers = alloc_workqueue("rmw", flags,
max_active);
>> +	fs_info->endio_write_workers =
alloc_workqueue("endio-write", flags,
>> +						       max_active);
>> +	fs_info->endio_freespace_worker =
alloc_workqueue("freespace-write",
>> +							  flags, 1);
>> +	fs_info->delayed_workers =
alloc_workqueue("delayed_meta", flags,
>> +						   max_active);
>> +	fs_info->readahead_workers =
alloc_workqueue("readahead", flags,
>> +						     max_active);
>> +	fs_info->qgroup_rescan_workers =
alloc_workqueue("group-rescan",
>> +							 flags, 1);
>>   	/*
>>   	 * btrfs_start_workers can really only fail because of ENOMEM so
just
>>   	 * return -ENOMEM if any of these fail.
>>   	 */
>>   	ret = btrfs_start_workers(&fs_info->workers);
>>   	ret |= btrfs_start_workers(&fs_info->generic_worker);
>> -	ret |= btrfs_start_workers(&fs_info->submit_workers);
>>   	ret |= btrfs_start_workers(&fs_info->delalloc_workers);
>> -	ret |= btrfs_start_workers(&fs_info->fixup_workers);
>> -	ret |= btrfs_start_workers(&fs_info->endio_workers);
>> -	ret |= btrfs_start_workers(&fs_info->endio_meta_workers);
>> -	ret |= btrfs_start_workers(&fs_info->rmw_workers);
>> -	ret |= btrfs_start_workers(&fs_info->endio_raid56_workers);
>> -	ret |=
btrfs_start_workers(&fs_info->endio_meta_write_workers);
>> -	ret |= btrfs_start_workers(&fs_info->endio_write_workers);
>> -	ret |= btrfs_start_workers(&fs_info->endio_freespace_worker);
>> -	ret |= btrfs_start_workers(&fs_info->delayed_workers);
>> -	ret |= btrfs_start_workers(&fs_info->caching_workers);
>> -	ret |= btrfs_start_workers(&fs_info->readahead_workers);
>> -	ret |= btrfs_start_workers(&fs_info->flush_workers);
>> -	ret |= btrfs_start_workers(&fs_info->qgroup_rescan_workers);
>> -	if (ret) {
>> +	ret |= btrfs_start_workers(&fs_info->submit_workers);
>> +
>> +	if (ret || !(fs_info->flush_workers &&
fs_info->endio_workers &&
>> +		     fs_info->endio_meta_workers &&
>> +		     fs_info->endio_raid56_workers &&
>> +		     fs_info->rmw_workers &&
fs_info->qgroup_rescan_workers &&
>> +		     fs_info->endio_meta_write_workers &&
>> +		     fs_info->endio_write_workers &&
>> +		     fs_info->caching_workers &&
fs_info->readahead_workers &&
>> +		     fs_info->fixup_workers &&
fs_info->delayed_workers)) {
>>   		err = -ENOMEM;
>>   		goto fail_sb_buffer;
>>   	}
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 0236de7..c8f67d9 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -377,7 +377,7 @@ static u64 add_new_free_space(struct
btrfs_block_group_cache *block_group,
>>   	return total_added;
>>   }
>>   
>> -static noinline void caching_thread(struct btrfs_work *work)
>> +static noinline void caching_thread(struct work_struct *work)
>>   {
>>   	struct btrfs_block_group_cache *block_group;
>>   	struct btrfs_fs_info *fs_info;
>> @@ -530,7 +530,7 @@ static int cache_block_group(struct
btrfs_block_group_cache *cache,
>>   	caching_ctl->block_group = cache;
>>   	caching_ctl->progress = cache->key.objectid;
>>   	atomic_set(&caching_ctl->count, 1);
>> -	caching_ctl->work.func = caching_thread;
>> +	INIT_WORK(&caching_ctl->work, caching_thread);
>>   
>>   	spin_lock(&cache->lock);
>>   	/*
>> @@ -621,7 +621,7 @@ static int cache_block_group(struct
btrfs_block_group_cache *cache,
>>   
>>   	btrfs_get_block_group(cache);
>>   
>> -	btrfs_queue_worker(&fs_info->caching_workers,
&caching_ctl->work);
>> +	queue_work(fs_info->caching_workers, &caching_ctl->work);
>>   
>>   	return ret;
>>   }
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index b7c2487..53901a5 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -1818,10 +1818,10 @@ int btrfs_set_extent_delalloc(struct inode
*inode, u64 start, u64 end,
>>   /* see btrfs_writepage_start_hook for details on why this is required
*/
>>   struct btrfs_writepage_fixup {
>>   	struct page *page;
>> -	struct btrfs_work work;
>> +	struct work_struct work;
>>   };
>>   
>> -static void btrfs_writepage_fixup_worker(struct btrfs_work *work)
>> +static void btrfs_writepage_fixup_worker(struct work_struct *work)
>>   {
>>   	struct btrfs_writepage_fixup *fixup;
>>   	struct btrfs_ordered_extent *ordered;
>> @@ -1912,9 +1912,9 @@ static int btrfs_writepage_start_hook(struct page
*page, u64 start, u64 end)
>>   
>>   	SetPageChecked(page);
>>   	page_cache_get(page);
>> -	fixup->work.func = btrfs_writepage_fixup_worker;
>> +	INIT_WORK(&fixup->work, btrfs_writepage_fixup_worker);
>>   	fixup->page = page;
>> -	btrfs_queue_worker(&root->fs_info->fixup_workers,
&fixup->work);
>> +	queue_work(root->fs_info->fixup_workers, &fixup->work);
>>   	return -EBUSY;
>>   }
>>   
>> @@ -2780,7 +2780,7 @@ out:
>>   	return ret;
>>   }
>>   
>> -static void finish_ordered_fn(struct btrfs_work *work)
>> +static void finish_ordered_fn(struct work_struct *work)
>>   {
>>   	struct btrfs_ordered_extent *ordered_extent;
>>   	ordered_extent = container_of(work, struct btrfs_ordered_extent,
work);
>> @@ -2793,7 +2793,7 @@ static int btrfs_writepage_end_io_hook(struct
page *page, u64 start, u64 end,
>>   	struct inode *inode = page->mapping->host;
>>   	struct btrfs_root *root = BTRFS_I(inode)->root;
>>   	struct btrfs_ordered_extent *ordered_extent = NULL;
>> -	struct btrfs_workers *workers;
>> +	struct workqueue_struct *workers;
>>   
>>   	trace_btrfs_writepage_end_io_hook(page, start, end, uptodate);
>>   
>> @@ -2802,14 +2802,13 @@ static int btrfs_writepage_end_io_hook(struct
page *page, u64 start, u64 end,
>>   					    end - start + 1, uptodate))
>>   		return 0;
>>   
>> -	ordered_extent->work.func = finish_ordered_fn;
>> -	ordered_extent->work.flags = 0;
>> +	INIT_WORK(&ordered_extent->work, finish_ordered_fn);
>>   
>>   	if (btrfs_is_free_space_inode(inode))
>> -		workers = &root->fs_info->endio_freespace_worker;
>> +		workers = root->fs_info->endio_freespace_worker;
>>   	else
>> -		workers = &root->fs_info->endio_write_workers;
>> -	btrfs_queue_worker(workers, &ordered_extent->work);
>> +		workers = root->fs_info->endio_write_workers;
>> +	queue_work(workers, &ordered_extent->work);
>>   
>>   	return 0;
>>   }
>> @@ -6906,10 +6905,9 @@ again:
>>   	if (!ret)
>>   		goto out_test;
>>   
>> -	ordered->work.func = finish_ordered_fn;
>> -	ordered->work.flags = 0;
>> -	btrfs_queue_worker(&root->fs_info->endio_write_workers,
>> -			   &ordered->work);
>> +	INIT_WORK(&ordered->work, finish_ordered_fn);
>> +	queue_work(root->fs_info->endio_write_workers,
&ordered->work);
>> +
>>   out_test:
>>   	/*
>>   	 * our bio might span multiple ordered extents.  If we
haven''t
>> @@ -8187,7 +8185,7 @@ out_notrans:
>>   	return ret;
>>   }
>>   
>> -static void btrfs_run_delalloc_work(struct btrfs_work *work)
>> +static void btrfs_run_delalloc_work(struct work_struct *work)
>>   {
>>   	struct btrfs_delalloc_work *delalloc_work;
>>   
>> @@ -8206,7 +8204,7 @@ static void btrfs_run_delalloc_work(struct
btrfs_work *work)
>>   }
>>   
>>   struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode
*inode,
>> -						    int wait, int delay_iput)
>> +						      int wait, int delay_iput)
>>   {
>>   	struct btrfs_delalloc_work *work;
>>   
>> @@ -8219,8 +8217,7 @@ struct btrfs_delalloc_work
*btrfs_alloc_delalloc_work(struct inode *inode,
>>   	work->inode = inode;
>>   	work->wait = wait;
>>   	work->delay_iput = delay_iput;
>> -	work->work.func = btrfs_run_delalloc_work;
>> -
>> +	INIT_WORK(&work->work, btrfs_run_delalloc_work);
>>   	return work;
>>   }
>>   
>> @@ -8267,8 +8264,7 @@ static int __start_delalloc_inodes(struct
btrfs_root *root, int delay_iput)
>>   			goto out;
>>   		}
>>   		list_add_tail(&work->list, &works);
>> -		btrfs_queue_worker(&root->fs_info->flush_workers,
>> -				   &work->work);
>> +		queue_work(root->fs_info->flush_workers, &work->work);
>>   
>>   		cond_resched();
>>   		spin_lock(&root->delalloc_lock);
>> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
>> index 8136982..9b5ccac 100644
>> --- a/fs/btrfs/ordered-data.c
>> +++ b/fs/btrfs/ordered-data.c
>> @@ -552,7 +552,7 @@ void btrfs_remove_ordered_extent(struct inode
*inode,
>>   	wake_up(&entry->wait);
>>   }
>>   
>> -static void btrfs_run_ordered_extent_work(struct btrfs_work *work)
>> +static void btrfs_run_ordered_extent_work(struct work_struct *work)
>>   {
>>   	struct btrfs_ordered_extent *ordered;
>>   
>> @@ -594,10 +594,9 @@ void btrfs_wait_ordered_extents(struct btrfs_root
*root, int delay_iput)
>>   		atomic_inc(&ordered->refs);
>>   		spin_unlock(&root->ordered_extent_lock);
>>   
>> -		ordered->flush_work.func = btrfs_run_ordered_extent_work;
>> +		INIT_WORK(&ordered->flush_work,
btrfs_run_ordered_extent_work);
>>   		list_add_tail(&ordered->work_list, &works);
>> -		btrfs_queue_worker(&root->fs_info->flush_workers,
>> -				   &ordered->flush_work);
>> +		queue_work(root->fs_info->flush_workers,
&ordered->flush_work);
>>   
>>   		cond_resched();
>>   		spin_lock(&root->ordered_extent_lock);
>> @@ -706,8 +705,8 @@ int btrfs_run_ordered_operations(struct
btrfs_trans_handle *trans,
>>   			goto out;
>>   		}
>>   		list_add_tail(&work->list, &works);
>> -		btrfs_queue_worker(&root->fs_info->flush_workers,
>> -				   &work->work);
>> +		queue_work(root->fs_info->flush_workers,
>> +			   &work->work);
>>   
>>   		cond_resched();
>>   		spin_lock(&root->fs_info->ordered_root_lock);
>> diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h
>> index 68844d5..f4c81d7 100644
>> --- a/fs/btrfs/ordered-data.h
>> +++ b/fs/btrfs/ordered-data.h
>> @@ -123,10 +123,10 @@ struct btrfs_ordered_extent {
>>   	/* a per root list of all the pending ordered extents */
>>   	struct list_head root_extent_list;
>>   
>> -	struct btrfs_work work;
>> +	struct work_struct work;
>>   
>>   	struct completion completion;
>> -	struct btrfs_work flush_work;
>> +	struct work_struct flush_work;
>>   	struct list_head work_list;
>>   };
>>   
>> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
>> index 1280eff..a49fdfe 100644
>> --- a/fs/btrfs/qgroup.c
>> +++ b/fs/btrfs/qgroup.c
>> @@ -1528,8 +1528,8 @@ int btrfs_run_qgroups(struct btrfs_trans_handle
*trans,
>>   		ret = qgroup_rescan_init(fs_info, 0, 1);
>>   		if (!ret) {
>>   			qgroup_rescan_zero_tracking(fs_info);
>> -			btrfs_queue_worker(&fs_info->qgroup_rescan_workers,
>> -					   &fs_info->qgroup_rescan_work);
>> +			queue_work(fs_info->qgroup_rescan_workers,
>> +				   &fs_info->qgroup_rescan_work);
>>   		}
>>   		ret = 0;
>>   	}
>> @@ -1994,7 +1994,7 @@ out:
>>   	return ret;
>>   }
>>   
>> -static void btrfs_qgroup_rescan_worker(struct btrfs_work *work)
>> +static void btrfs_qgroup_rescan_worker(struct work_struct *work)
>>   {
>>   	struct btrfs_fs_info *fs_info = container_of(work, struct
btrfs_fs_info,
>>   						     qgroup_rescan_work);
>> @@ -2105,7 +2105,7 @@ qgroup_rescan_init(struct btrfs_fs_info *fs_info,
u64 progress_objectid,
>>   
>>   	memset(&fs_info->qgroup_rescan_work, 0,
>>   	       sizeof(fs_info->qgroup_rescan_work));
>> -	fs_info->qgroup_rescan_work.func = btrfs_qgroup_rescan_worker;
>> +	INIT_WORK(&fs_info->qgroup_rescan_work,
btrfs_qgroup_rescan_worker);
>>   
>>   	if (ret) {
>>   err:
>> @@ -2168,8 +2168,8 @@ btrfs_qgroup_rescan(struct btrfs_fs_info
*fs_info)
>>   
>>   	qgroup_rescan_zero_tracking(fs_info);
>>   
>> -	btrfs_queue_worker(&fs_info->qgroup_rescan_workers,
>> -			   &fs_info->qgroup_rescan_work);
>> +	queue_work(fs_info->qgroup_rescan_workers,
>> +		   &fs_info->qgroup_rescan_work);
>>   
>>   	return 0;
>>   }
>> @@ -2200,6 +2200,6 @@ void
>>   btrfs_qgroup_rescan_resume(struct btrfs_fs_info *fs_info)
>>   {
>>   	if (fs_info->qgroup_flags & BTRFS_QGROUP_STATUS_FLAG_RESCAN)
>> -		btrfs_queue_worker(&fs_info->qgroup_rescan_workers,
>> -				   &fs_info->qgroup_rescan_work);
>> +		queue_work(fs_info->qgroup_rescan_workers,
>> +			   &fs_info->qgroup_rescan_work);
>>   }
>> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
>> index 0525e13..4b7769d 100644
>> --- a/fs/btrfs/raid56.c
>> +++ b/fs/btrfs/raid56.c
>> @@ -88,7 +88,7 @@ struct btrfs_raid_bio {
>>   	/*
>>   	 * for scheduling work in the helper threads
>>   	 */
>> -	struct btrfs_work work;
>> +	struct work_struct work;
>>   
>>   	/*
>>   	 * bio list and bio_list_lock are used
>> @@ -167,8 +167,8 @@ struct btrfs_raid_bio {
>>   
>>   static int __raid56_parity_recover(struct btrfs_raid_bio *rbio);
>>   static noinline void finish_rmw(struct btrfs_raid_bio *rbio);
>> -static void rmw_work(struct btrfs_work *work);
>> -static void read_rebuild_work(struct btrfs_work *work);
>> +static void rmw_work(struct work_struct *work);
>> +static void read_rebuild_work(struct work_struct *work);
>>   static void async_rmw_stripe(struct btrfs_raid_bio *rbio);
>>   static void async_read_rebuild(struct btrfs_raid_bio *rbio);
>>   static int fail_bio_stripe(struct btrfs_raid_bio *rbio, struct bio
*bio);
>> @@ -1417,20 +1417,16 @@ cleanup:
>>   
>>   static void async_rmw_stripe(struct btrfs_raid_bio *rbio)
>>   {
>> -	rbio->work.flags = 0;
>> -	rbio->work.func = rmw_work;
>> -
>> -	btrfs_queue_worker(&rbio->fs_info->rmw_workers,
>> -			   &rbio->work);
>> +	INIT_WORK(&rbio->work, rmw_work);
>> +	queue_work(rbio->fs_info->rmw_workers,
>> +		   &rbio->work);
>>   }
>>   
>>   static void async_read_rebuild(struct btrfs_raid_bio *rbio)
>>   {
>> -	rbio->work.flags = 0;
>> -	rbio->work.func = read_rebuild_work;
>> -
>> -	btrfs_queue_worker(&rbio->fs_info->rmw_workers,
>> -			   &rbio->work);
>> +	INIT_WORK(&rbio->work, read_rebuild_work);
>> +	queue_work(rbio->fs_info->rmw_workers,
>> +		   &rbio->work);
>>   }
>>   
>>   /*
>> @@ -1589,7 +1585,7 @@ struct btrfs_plug_cb {
>>   	struct blk_plug_cb cb;
>>   	struct btrfs_fs_info *info;
>>   	struct list_head rbio_list;
>> -	struct btrfs_work work;
>> +	struct work_struct work;
>>   };
>>   
>>   /*
>> @@ -1653,7 +1649,7 @@ static void run_plug(struct btrfs_plug_cb *plug)
>>    * if the unplug comes from schedule, we have to push the
>>    * work off to a helper thread
>>    */
>> -static void unplug_work(struct btrfs_work *work)
>> +static void unplug_work(struct work_struct *work)
>>   {
>>   	struct btrfs_plug_cb *plug;
>>   	plug = container_of(work, struct btrfs_plug_cb, work);
>> @@ -1666,10 +1662,9 @@ static void btrfs_raid_unplug(struct blk_plug_cb
*cb, bool from_schedule)
>>   	plug = container_of(cb, struct btrfs_plug_cb, cb);
>>   
>>   	if (from_schedule) {
>> -		plug->work.flags = 0;
>> -		plug->work.func = unplug_work;
>> -		btrfs_queue_worker(&plug->info->rmw_workers,
>> -				   &plug->work);
>> +		INIT_WORK(&plug->work, unplug_work);
>> +		queue_work(plug->info->rmw_workers,
>> +			   &plug->work);
>>   		return;
>>   	}
>>   	run_plug(plug);
>> @@ -2083,7 +2078,7 @@ int raid56_parity_recover(struct btrfs_root
*root, struct bio *bio,
>>   
>>   }
>>   
>> -static void rmw_work(struct btrfs_work *work)
>> +static void rmw_work(struct work_struct *work)
>>   {
>>   	struct btrfs_raid_bio *rbio;
>>   
>> @@ -2091,7 +2086,7 @@ static void rmw_work(struct btrfs_work *work)
>>   	raid56_rmw_stripe(rbio);
>>   }
>>   
>> -static void read_rebuild_work(struct btrfs_work *work)
>> +static void read_rebuild_work(struct work_struct *work)
>>   {
>>   	struct btrfs_raid_bio *rbio;
>>   
>> diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
>> index 1031b69..9607648 100644
>> --- a/fs/btrfs/reada.c
>> +++ b/fs/btrfs/reada.c
>> @@ -91,7 +91,7 @@ struct reada_zone {
>>   };
>>   
>>   struct reada_machine_work {
>> -	struct btrfs_work	work;
>> +	struct work_struct	work;
>>   	struct btrfs_fs_info	*fs_info;
>>   };
>>   
>> @@ -732,7 +732,7 @@ static int reada_start_machine_dev(struct
btrfs_fs_info *fs_info,
>>   
>>   }
>>   
>> -static void reada_start_machine_worker(struct btrfs_work *work)
>> +static void reada_start_machine_worker(struct work_struct *work)
>>   {
>>   	struct reada_machine_work *rmw;
>>   	struct btrfs_fs_info *fs_info;
>> @@ -792,10 +792,10 @@ static void reada_start_machine(struct
btrfs_fs_info *fs_info)
>>   		/* FIXME we cannot handle this properly right now */
>>   		BUG();
>>   	}
>> -	rmw->work.func = reada_start_machine_worker;
>> +	INIT_WORK(&rmw->work, reada_start_machine_worker);
>>   	rmw->fs_info = fs_info;
>>   
>> -	btrfs_queue_worker(&fs_info->readahead_workers,
&rmw->work);
>> +	queue_work(fs_info->readahead_workers, &rmw->work);
>>   }
>>   
>>   #ifdef DEBUG
>> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
>> index 4ba2a69..025bb53 100644
>> --- a/fs/btrfs/scrub.c
>> +++ b/fs/btrfs/scrub.c
>> @@ -96,7 +96,7 @@ struct scrub_bio {
>>   #endif
>>   	int			page_count;
>>   	int			next_free;
>> -	struct btrfs_work	work;
>> +	struct work_struct	work;
>>   };
>>   
>>   struct scrub_block {
>> @@ -154,7 +154,7 @@ struct scrub_fixup_nodatasum {
>>   	struct btrfs_device	*dev;
>>   	u64			logical;
>>   	struct btrfs_root	*root;
>> -	struct btrfs_work	work;
>> +	struct work_struct	work;
>>   	int			mirror_num;
>>   };
>>   
>> @@ -164,7 +164,7 @@ struct scrub_copy_nocow_ctx {
>>   	u64			len;
>>   	int			mirror_num;
>>   	u64			physical_for_dev_replace;
>> -	struct btrfs_work	work;
>> +	struct work_struct	work;
>>   };
>>   
>>   struct scrub_warning {
>> @@ -224,7 +224,7 @@ static int scrub_pages(struct scrub_ctx *sctx, u64
logical, u64 len,
>>   		       u64 gen, int mirror_num, u8 *csum, int force,
>>   		       u64 physical_for_dev_replace);
>>   static void scrub_bio_end_io(struct bio *bio, int err);
>> -static void scrub_bio_end_io_worker(struct btrfs_work *work);
>> +static void scrub_bio_end_io_worker(struct work_struct *work);
>>   static void scrub_block_complete(struct scrub_block *sblock);
>>   static void scrub_remap_extent(struct btrfs_fs_info *fs_info,
>>   			       u64 extent_logical, u64 extent_len,
>> @@ -241,14 +241,14 @@ static int scrub_add_page_to_wr_bio(struct
scrub_ctx *sctx,
>>   				    struct scrub_page *spage);
>>   static void scrub_wr_submit(struct scrub_ctx *sctx);
>>   static void scrub_wr_bio_end_io(struct bio *bio, int err);
>> -static void scrub_wr_bio_end_io_worker(struct btrfs_work *work);
>> +static void scrub_wr_bio_end_io_worker(struct work_struct *work);
>>   static int write_page_nocow(struct scrub_ctx *sctx,
>>   			    u64 physical_for_dev_replace, struct page *page);
>>   static int copy_nocow_pages_for_inode(u64 inum, u64 offset, u64 root,
>>   				      void *ctx);
>>   static int copy_nocow_pages(struct scrub_ctx *sctx, u64 logical, u64
len,
>>   			    int mirror_num, u64 physical_for_dev_replace);
>> -static void copy_nocow_pages_worker(struct btrfs_work *work);
>> +static void copy_nocow_pages_worker(struct work_struct *work);
>>   
>>   
>>   static void scrub_pending_bio_inc(struct scrub_ctx *sctx)
>> @@ -386,7 +386,7 @@ struct scrub_ctx *scrub_setup_ctx(struct
btrfs_device *dev, int is_dev_replace)
>>   		sbio->index = i;
>>   		sbio->sctx = sctx;
>>   		sbio->page_count = 0;
>> -		sbio->work.func = scrub_bio_end_io_worker;
>> +		INIT_WORK(&sbio->work, scrub_bio_end_io_worker);
>>   
>>   		if (i != SCRUB_BIOS_PER_SCTX - 1)
>>   			sctx->bios[i]->next_free = i + 1;
>> @@ -691,7 +691,7 @@ out:
>>   	return -EIO;
>>   }
>>   
>> -static void scrub_fixup_nodatasum(struct btrfs_work *work)
>> +static void scrub_fixup_nodatasum(struct work_struct *work)
>>   {
>>   	int ret;
>>   	struct scrub_fixup_nodatasum *fixup;
>> @@ -956,9 +956,8 @@ nodatasum_case:
>>   		fixup_nodatasum->root = fs_info->extent_root;
>>   		fixup_nodatasum->mirror_num = failed_mirror_index + 1;
>>   		scrub_pending_trans_workers_inc(sctx);
>> -		fixup_nodatasum->work.func = scrub_fixup_nodatasum;
>> -		btrfs_queue_worker(&fs_info->scrub_workers,
>> -				   &fixup_nodatasum->work);
>> +		INIT_WORK(&fixup_nodatasum->work, scrub_fixup_nodatasum);
>> +		queue_work(fs_info->scrub_workers,
&fixup_nodatasum->work);
>>   		goto out;
>>   	}
>>   
>> @@ -1592,11 +1591,11 @@ static void scrub_wr_bio_end_io(struct bio
*bio, int err)
>>   	sbio->err = err;
>>   	sbio->bio = bio;
>>   
>> -	sbio->work.func = scrub_wr_bio_end_io_worker;
>> -	btrfs_queue_worker(&fs_info->scrub_wr_completion_workers,
&sbio->work);
>> +	INIT_WORK(&sbio->work, scrub_wr_bio_end_io_worker);
>> +	queue_work(fs_info->scrub_wr_completion_workers,
&sbio->work);
>>   }
>>   
>> -static void scrub_wr_bio_end_io_worker(struct btrfs_work *work)
>> +static void scrub_wr_bio_end_io_worker(struct work_struct *work)
>>   {
>>   	struct scrub_bio *sbio = container_of(work, struct scrub_bio, work);
>>   	struct scrub_ctx *sctx = sbio->sctx;
>> @@ -2061,10 +2060,10 @@ static void scrub_bio_end_io(struct bio *bio,
int err)
>>   	sbio->err = err;
>>   	sbio->bio = bio;
>>   
>> -	btrfs_queue_worker(&fs_info->scrub_workers,
&sbio->work);
>> +	queue_work(fs_info->scrub_workers, &sbio->work);
>>   }
>>   
>> -static void scrub_bio_end_io_worker(struct btrfs_work *work)
>> +static void scrub_bio_end_io_worker(struct work_struct *work)
>>   {
>>   	struct scrub_bio *sbio = container_of(work, struct scrub_bio, work);
>>   	struct scrub_ctx *sctx = sbio->sctx;
>> @@ -2778,34 +2777,33 @@ static noinline_for_stack int
scrub_workers_get(struct btrfs_fs_info *fs_info,
>>   						int is_dev_replace)
>>   {
>>   	int ret = 0;
>> +	int flags = WQ_UNBOUND | WQ_MEM_RECLAIM;
>> +	int max_active = fs_info->thread_pool_size;
>>   
>>   	mutex_lock(&fs_info->scrub_lock);
>>   	if (fs_info->scrub_workers_refcnt == 0) {
>>   		if (is_dev_replace)
>> -			btrfs_init_workers(&fs_info->scrub_workers,
"scrub", 1,
>> -					&fs_info->generic_worker);
>> +			fs_info->scrub_workers >> +			
alloc_workqueue("scrub", flags, 1);
>>   		else
>> -			btrfs_init_workers(&fs_info->scrub_workers,
"scrub",
>> -					fs_info->thread_pool_size,
>> -					&fs_info->generic_worker);
>> -		fs_info->scrub_workers.idle_thresh = 4;
>> -		ret = btrfs_start_workers(&fs_info->scrub_workers);
>> -		if (ret)
>> +			fs_info->scrub_workers >> +			
alloc_workqueue("scrub", flags, max_active);
>> +		if (!fs_info->scrub_workers) {
>> +			ret = -ENOMEM;
>>   			goto out;
>> -		btrfs_init_workers(&fs_info->scrub_wr_completion_workers,
>> -				   "scrubwrc",
>> -				   fs_info->thread_pool_size,
>> -				   &fs_info->generic_worker);
>> -		fs_info->scrub_wr_completion_workers.idle_thresh = 2;
>> -		ret = btrfs_start_workers(
>> -				&fs_info->scrub_wr_completion_workers);
>> -		if (ret)
>> +		}
>> +		fs_info->scrub_wr_completion_workers >> +		
alloc_workqueue("scrubwrc", flags, max_active);
>> +		if (!fs_info->scrub_wr_completion_workers) {
>> +			ret = -ENOMEM;
>>   			goto out;
>> -		btrfs_init_workers(&fs_info->scrub_nocow_workers,
"scrubnc", 1,
>> -				   &fs_info->generic_worker);
>> -		ret = btrfs_start_workers(&fs_info->scrub_nocow_workers);
>> -		if (ret)
>> +		}
>> +		fs_info->scrub_nocow_workers >> +		
alloc_workqueue("scrubnc", flags, 1);
>> +		if (!fs_info->scrub_nocow_workers) {
>> +			ret = -ENOMEM;
>>   			goto out;
>> +		}
>>   	}
>>   	++fs_info->scrub_workers_refcnt;
>>   out:
>> @@ -2818,9 +2816,9 @@ static noinline_for_stack void
scrub_workers_put(struct btrfs_fs_info *fs_info)
>>   {
>>   	mutex_lock(&fs_info->scrub_lock);
>>   	if (--fs_info->scrub_workers_refcnt == 0) {
>> -		btrfs_stop_workers(&fs_info->scrub_workers);
>> -		btrfs_stop_workers(&fs_info->scrub_wr_completion_workers);
>> -		btrfs_stop_workers(&fs_info->scrub_nocow_workers);
>> +		destroy_workqueue(fs_info->scrub_workers);
>> +		destroy_workqueue(fs_info->scrub_wr_completion_workers);
>> +		destroy_workqueue(fs_info->scrub_nocow_workers);
>>   	}
>>   	WARN_ON(fs_info->scrub_workers_refcnt < 0);
>>   	mutex_unlock(&fs_info->scrub_lock);
>> @@ -3130,14 +3128,14 @@ static int copy_nocow_pages(struct scrub_ctx
*sctx, u64 logical, u64 len,
>>   	nocow_ctx->len = len;
>>   	nocow_ctx->mirror_num = mirror_num;
>>   	nocow_ctx->physical_for_dev_replace = physical_for_dev_replace;
>> -	nocow_ctx->work.func = copy_nocow_pages_worker;
>> -	btrfs_queue_worker(&fs_info->scrub_nocow_workers,
>> -			   &nocow_ctx->work);
>> +	INIT_WORK(&nocow_ctx->work, copy_nocow_pages_worker);
>> +	queue_work(fs_info->scrub_nocow_workers,
>> +		   &nocow_ctx->work);
>>   
>>   	return 0;
>>   }
>>   
>> -static void copy_nocow_pages_worker(struct btrfs_work *work)
>> +static void copy_nocow_pages_worker(struct work_struct *work)
>>   {
>>   	struct scrub_copy_nocow_ctx *nocow_ctx >>   	
container_of(work, struct scrub_copy_nocow_ctx, work);
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index 8eb6191..f557ab6 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -1177,16 +1177,19 @@ static void btrfs_resize_thread_pool(struct
btrfs_fs_info *fs_info,
>>   	btrfs_set_max_workers(&fs_info->workers, new_pool_size);
>>   	btrfs_set_max_workers(&fs_info->delalloc_workers,
new_pool_size);
>>   	btrfs_set_max_workers(&fs_info->submit_workers,
new_pool_size);
>> -	btrfs_set_max_workers(&fs_info->caching_workers,
new_pool_size);
>> -	btrfs_set_max_workers(&fs_info->fixup_workers, new_pool_size);
>> -	btrfs_set_max_workers(&fs_info->endio_workers, new_pool_size);
>> -	btrfs_set_max_workers(&fs_info->endio_meta_workers,
new_pool_size);
>> -	btrfs_set_max_workers(&fs_info->endio_meta_write_workers,
new_pool_size);
>> -	btrfs_set_max_workers(&fs_info->endio_write_workers,
new_pool_size);
>> -	btrfs_set_max_workers(&fs_info->endio_freespace_worker,
new_pool_size);
>> -	btrfs_set_max_workers(&fs_info->delayed_workers,
new_pool_size);
>> -	btrfs_set_max_workers(&fs_info->readahead_workers,
new_pool_size);
>> -	btrfs_set_max_workers(&fs_info->scrub_wr_completion_workers,
>> +	workqueue_set_max_active(fs_info->caching_workers, new_pool_size);
>> +	workqueue_set_max_active(fs_info->fixup_workers, new_pool_size);
>> +	workqueue_set_max_active(fs_info->endio_workers, new_pool_size);
>> +	workqueue_set_max_active(fs_info->endio_meta_workers,
new_pool_size);
>> +	workqueue_set_max_active(fs_info->endio_meta_write_workers,
>> +				 new_pool_size);
>> +	workqueue_set_max_active(fs_info->endio_write_workers,
>> +				 new_pool_size);
>> +	workqueue_set_max_active(fs_info->endio_freespace_worker,
>> +				 new_pool_size);
>> +	workqueue_set_max_active(fs_info->delayed_workers, new_pool_size);
>> +	workqueue_set_max_active(fs_info->readahead_workers,
new_pool_size);
>> +	workqueue_set_max_active(fs_info->scrub_wr_completion_workers,
>>   			      new_pool_size);
>>   }
>>   
>> -- 
>> 1.8.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
-----------------------------------------------------
Qu Wenruo
Development Dept.I
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, 210012, China
TEL: +86+25-86630566-8526
COINS: 7998-8526
FAX: +86+25-83317685
MAIL: quwenruo@cn.fujitsu.com
-----------------------------------------------------

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Liu Bo

2013-Sep-13 01:47 UTC

head link

Re: [PATCH v2 9/9] btrfs: Replace thread_pool_size with workqueue default value

On Thu, Sep 12, 2013 at 04:08:24PM +0800, Qu Wenruo
wrote:> The original btrfs_workers uses the fs_info->thread_pool_size as the
> max_active, and the previous patches followed this way.
> 
> But the kernel workqueue has the default value(0) for workqueue,
> and workqueue itself has some threshold mechanism to prevent creating
> too many threads, so we should use the default value.
> 
> Since the thread_pool_size algorithm is not used, related codes should
> also be changed.
Ohh, I should have seen this mail first before commenting
''max_active''.

I think that some tuning work should be done on this part, according to
my tests, setting max_active=0 will create ~258 worker helpers
(kworker/uX:X if you set WQ_UNBOUND), this may cause too many context
switches which will have an impact on performance in some cases.

-liubo
> 
> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
> ---
>  fs/btrfs/disk-io.c | 12 +++++++-----
>  fs/btrfs/super.c   |  3 +--
>  2 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index a61e1fe..0446d27 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -750,9 +750,11 @@ int btrfs_bio_wq_end_io(struct btrfs_fs_info *info,
struct bio *bio,
>  
>  unsigned long btrfs_async_submit_limit(struct btrfs_fs_info *info)
>  {
> -	unsigned long limit = min_t(unsigned long,
> -				    info->thread_pool_size,
> -				    info->fs_devices->open_devices);
> +	unsigned long limit;
> +	limit = info->thread_pool_size ?
> +		min_t(unsigned long, info->thread_pool_size,
> +		      info->fs_devices->open_devices) :
> +		info->fs_devices->open_devices;
>  	return 256 * limit;
>  }
>  
> @@ -2191,8 +2193,8 @@ int open_ctree(struct super_block *sb,
>  	INIT_RADIX_TREE(&fs_info->reada_tree, GFP_NOFS & ~__GFP_WAIT);
>  	spin_lock_init(&fs_info->reada_lock);
>  
> -	fs_info->thread_pool_size = min_t(unsigned long,
> -					  num_online_cpus() + 2, 8);
> +	/* use the default value of kernel workqueue */
> +	fs_info->thread_pool_size = 0;
>  
>  	INIT_LIST_HEAD(&fs_info->ordered_roots);
>  	spin_lock_init(&fs_info->ordered_root_lock);
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 63e653c..ccf412f 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -898,8 +898,7 @@ static int btrfs_show_options(struct seq_file *seq,
struct dentry *dentry)
>  	if (info->alloc_start != 0)
>  		seq_printf(seq, ",alloc_start=%llu",
>  			   (unsigned long long)info->alloc_start);
> -	if (info->thread_pool_size !=  min_t(unsigned long,
> -					     num_online_cpus() + 2, 8))
> +	if (info->thread_pool_size)
>  		seq_printf(seq, ",thread_pool=%d", info->thread_pool_size);
>  	if (btrfs_test_opt(root, COMPRESS)) {
>  		if (info->compress_type == BTRFS_COMPRESS_ZLIB)
> -- 
> 1.8.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo

2013-Sep-13 02:03 UTC

head link

Re: [PATCH v2 0/9] btrfs: Replace the btrfs_workers with kernel workqueue

于 2013年09月13日 01:37, David Sterba 写道:> On Thu, Sep 12, 2013 at 04:08:15PM +0800, Qu Wenruo wrote:
>> Use kernel workqueue and kernel workqueue based new
btrfs_workqueue_struct to replace
>> the old btrfs_workers.
>> The main goal is to reduce the redundant codes(800 lines vs 200 lines)
and
>> try to get benefits from the latest workqueue changes.
>>
>> About the performance, the test suite I used is bonnie++,
>> and there seems no significant regression.
> You''re replacing a core infrastructure building block, more
testing is
> absolutely required, but using the available infrastructure is a good
> move.Definitely needs more test since the I lack enough different disks to 
test with.>
> I found a few things that do not replace the current implementation
> one-to-one:
>
> * the thread names lost the btrfs- prefix, this makes it hard to
>    identify the processes and we want this, either debugging or
>    performance monitoringYes, that''s right.
But the problem is, even I added "btrfs-" prefix to the wq,
the real work executor is kernel workers without any prefix.
Still hard to debugging due to the workqueue mechanism.>
> * od high priority tasks were handled in threads with unchanged priority
>    and just prioritized within the queue
>    newly addded WQ_HIGHPRI elevates the nice level of the thread, ie.
>    it''s not the same thing as before -- I need to look closerAlso true, since I didn''t find a way to ensure the high priority work
to be executed before any normal priority work,
I choose this workaround.
(Seems the original btrfs_workers also have some mechanism to avoid
starving, so I think this way maybe OK)>
> * the idle_thresh attribute is not reflected in the new code, I
don''t
>    know if the kernel workqueues have something equivalentIt seems that kernel will not create kthread without any control,
but still needs more investigation to make sure.>
>
> Other random comments:
>
> * you can use the same files for the new helpers, instead of bwq.[ch]The way I used is to avoid naming confliction and easy to clean.
If needed I''ll also use the async-thread.[ch]>
> * btrfs_workqueue_struct can drop the _struct suffixThe naming rule is mostly copied from kernel wq and just add "btrfs_"
prefix
if no naming confliction.
Will modify if needed.> * WQ_MEM_RECLAIM for the scrub thread does not seem right
>
> * WQ_FREEZABLE should be probably setWill modify soon.
>
>
> david
>
Thanks for the comment.

Qu

-- 
-----------------------------------------------------
Qu Wenruo
Development Dept.I
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, 210012, China
TEL: +86+25-86630566-8526
COINS: 7998-8526
FAX: +86+25-83317685
MAIL: quwenruo@cn.fujitsu.com
-----------------------------------------------------

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo

2013-Sep-13 03:15 UTC

head link

Re: [PATCH v2 9/9] btrfs: Replace thread_pool_size with workqueue default value

于 2013年09月13日 09:47, Liu Bo 写道:> On Thu, Sep 12, 2013 at 04:08:24PM +0800, Qu Wenruo wrote:
>> The original btrfs_workers uses the fs_info->thread_pool_size as the
>> max_active, and the previous patches followed this way.
>>
>> But the kernel workqueue has the default value(0) for workqueue,
>> and workqueue itself has some threshold mechanism to prevent creating
>> too many threads, so we should use the default value.
>>
>> Since the thread_pool_size algorithm is not used, related codes should
>> also be changed.
> Ohh, I should have seen this mail first before commenting
''max_active''.
>
> I think that some tuning work should be done on this part, according to
> my tests, setting max_active=0 will create ~258 worker helpers
> (kworker/uX:X if you set WQ_UNBOUND), this may cause too many context
> switches which will have an impact on performance in some cases.Yes, but the default number when using max_active=0 should be 256
(half of the WQ_DFL_ACTIVE).

Also in my test(single thread), the performance and CPU usage does not 
change too much.
So it seems that in this situation, the kernel has some control on creating
kthreads.

Still further max_active tunning is still quiet good.

Qu>
> -liubo
>
>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>> ---
>>   fs/btrfs/disk-io.c | 12 +++++++-----
>>   fs/btrfs/super.c   |  3 +--
>>   2 files changed, 8 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index a61e1fe..0446d27 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -750,9 +750,11 @@ int btrfs_bio_wq_end_io(struct btrfs_fs_info
*info, struct bio *bio,
>>   
>>   unsigned long btrfs_async_submit_limit(struct btrfs_fs_info *info)
>>   {
>> -	unsigned long limit = min_t(unsigned long,
>> -				    info->thread_pool_size,
>> -				    info->fs_devices->open_devices);
>> +	unsigned long limit;
>> +	limit = info->thread_pool_size ?
>> +		min_t(unsigned long, info->thread_pool_size,
>> +		      info->fs_devices->open_devices) :
>> +		info->fs_devices->open_devices;
>>   	return 256 * limit;
>>   }
>>   
>> @@ -2191,8 +2193,8 @@ int open_ctree(struct super_block *sb,
>>   	INIT_RADIX_TREE(&fs_info->reada_tree, GFP_NOFS &
~__GFP_WAIT);
>>   	spin_lock_init(&fs_info->reada_lock);
>>   
>> -	fs_info->thread_pool_size = min_t(unsigned long,
>> -					  num_online_cpus() + 2, 8);
>> +	/* use the default value of kernel workqueue */
>> +	fs_info->thread_pool_size = 0;
>>   
>>   	INIT_LIST_HEAD(&fs_info->ordered_roots);
>>   	spin_lock_init(&fs_info->ordered_root_lock);
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index 63e653c..ccf412f 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -898,8 +898,7 @@ static int btrfs_show_options(struct seq_file *seq,
struct dentry *dentry)
>>   	if (info->alloc_start != 0)
>>   		seq_printf(seq, ",alloc_start=%llu",
>>   			   (unsigned long long)info->alloc_start);
>> -	if (info->thread_pool_size !=  min_t(unsigned long,
>> -					     num_online_cpus() + 2, 8))
>> +	if (info->thread_pool_size)
>>   		seq_printf(seq, ",thread_pool=%d",
info->thread_pool_size);
>>   	if (btrfs_test_opt(root, COMPRESS)) {
>>   		if (info->compress_type == BTRFS_COMPRESS_ZLIB)
>> -- 
>> 1.8.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
-----------------------------------------------------
Qu Wenruo
Development Dept.I
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, 210012, China
TEL: +86+25-86630566-8526
COINS: 7998-8526
FAX: +86+25-83317685
MAIL: quwenruo@cn.fujitsu.com
-----------------------------------------------------

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo

2013-Sep-17 02:41 UTC

head link

Re: [PATCH v2 9/9] btrfs: Replace thread_pool_size with workqueue default value

于 2013年09月13日 11:15, Qu Wenruo 写道:> 于 2013年09月13日 09:47, Liu Bo 写道:
>> On Thu, Sep 12, 2013 at 04:08:24PM +0800, Qu Wenruo wrote:
>>> The original btrfs_workers uses the fs_info->thread_pool_size as
the
>>> max_active, and the previous patches followed this way.
>>>
>>> But the kernel workqueue has the default value(0) for workqueue,
>>> and workqueue itself has some threshold mechanism to prevent
creating
>>> too many threads, so we should use the default value.
>>>
>>> Since the thread_pool_size algorithm is not used, related codes
should
>>> also be changed.
>> Ohh, I should have seen this mail first before commenting
''max_active''.
>>
>> I think that some tuning work should be done on this part, according to
>> my tests, setting max_active=0 will create ~258 worker helpers
>> (kworker/uX:X if you set WQ_UNBOUND), this may cause too many context
>> switches which will have an impact on performance in some cases.
> Yes, but the default number when using max_active=0 should be 256
> (half of the WQ_DFL_ACTIVE).
>
> Also in my test(single thread), the performance and CPU usage does not 
> change too much.
> So it seems that in this situation, the kernel has some control on 
> creating
> kthreads.
>
> Still further max_active tunning is still quiet good.
>
> QuSorry for the last reply, according to the workqueue source code,
the unbound workqueue will continually create new thread if needed.

So the default value is not good for this situation.
Also there are so many wq using the unbound wq, the old thread_pool_size 
is too small.
(Maybe change some wq to bounded would be a good idea?)

It would be quite nice if you have any good idea or advice about the
tunning about max_active.

Qu
>>
>> -liubo
>>
>>> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
>>> ---
>>>   fs/btrfs/disk-io.c | 12 +++++++-----
>>>   fs/btrfs/super.c   |  3 +--
>>>   2 files changed, 8 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index a61e1fe..0446d27 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -750,9 +750,11 @@ int btrfs_bio_wq_end_io(struct btrfs_fs_info 
>>> *info, struct bio *bio,
>>>     unsigned long btrfs_async_submit_limit(struct btrfs_fs_info
*info)
>>>   {
>>> -    unsigned long limit = min_t(unsigned long,
>>> -                    info->thread_pool_size,
>>> -                    info->fs_devices->open_devices);
>>> +    unsigned long limit;
>>> +    limit = info->thread_pool_size ?
>>> +        min_t(unsigned long, info->thread_pool_size,
>>> +              info->fs_devices->open_devices) :
>>> +        info->fs_devices->open_devices;
>>>       return 256 * limit;
>>>   }
>>>   @@ -2191,8 +2193,8 @@ int open_ctree(struct super_block *sb,
>>>       INIT_RADIX_TREE(&fs_info->reada_tree, GFP_NOFS &
~__GFP_WAIT);
>>>       spin_lock_init(&fs_info->reada_lock);
>>>   -    fs_info->thread_pool_size = min_t(unsigned long,
>>> -                      num_online_cpus() + 2, 8);
>>> +    /* use the default value of kernel workqueue */
>>> +    fs_info->thread_pool_size = 0;
>>>         INIT_LIST_HEAD(&fs_info->ordered_roots);
>>>       spin_lock_init(&fs_info->ordered_root_lock);
>>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>>> index 63e653c..ccf412f 100644
>>> --- a/fs/btrfs/super.c
>>> +++ b/fs/btrfs/super.c
>>> @@ -898,8 +898,7 @@ static int btrfs_show_options(struct seq_file 
>>> *seq, struct dentry *dentry)
>>>       if (info->alloc_start != 0)
>>>           seq_printf(seq, ",alloc_start=%llu",
>>>                  (unsigned long long)info->alloc_start);
>>> -    if (info->thread_pool_size !=  min_t(unsigned long,
>>> -                         num_online_cpus() + 2, 8))
>>> +    if (info->thread_pool_size)
>>>           seq_printf(seq, ",thread_pool=%d",
info->thread_pool_size);
>>>       if (btrfs_test_opt(root, COMPRESS)) {
>>>           if (info->compress_type == BTRFS_COMPRESS_ZLIB)
>>> -- 
>>> 1.8.4
>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe 
>>> linux-btrfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe 
>> linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>

-- 
-----------------------------------------------------
Qu Wenruo
Development Dept.I
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, 210012, China
TEL: +86+25-86630566-8526
COINS: 7998-8526
FAX: +86+25-83317685
MAIL: quwenruo@cn.fujitsu.com
-----------------------------------------------------

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo

2013-Sep-20 06:13 UTC

head link

Re: [PATCH v2 0/9] btrfs: Replace the btrfs_workers with kernel workqueue

On thu, 12 Sep 2013 19:37:18 +0200, David Sterba wrote:> On Thu, Sep 12, 2013 at 04:08:15PM +0800, Qu Wenruo wrote:
>> Use kernel workqueue and kernel workqueue based new
btrfs_workqueue_struct to replace
>> the old btrfs_workers.
>> The main goal is to reduce the redundant codes(800 lines vs 200 lines)
and
>> try to get benefits from the latest workqueue changes.
>>
>> About the performance, the test suite I used is bonnie++,
>> and there seems no significant regression.
> You''re replacing a core infrastructure building block, more
testing is
> absolutely required, but using the available infrastructure is a good
> move.
>
> I found a few things that do not replace the current implementation
> one-to-one:
>
> * the thread names lost the btrfs- prefix, this makes it hard to
>    identify the processes and we want this, either debugging or
>    performance monitoring
>
> * od high priority tasks were handled in threads with unchanged priority
>    and just prioritized within the queue
>    newly addded WQ_HIGHPRI elevates the nice level of the thread, ie.
>    it''s not the same thing as before -- I need to look closer
>
> * the idle_thresh attribute is not reflected in the new code, I
don''t
>    know if the kernel workqueues have something equivalent
>
>
> Other random comments:
>
> * you can use the same files for the new helpers, instead of bwq.[ch]
>
> * btrfs_workqueue_struct can drop the _struct suffix
>
> * WQ_MEM_RECLAIM for the scrub thread does not seem rightI think scrub_workers,scrub_wr_completion_workers still need WQ_MEM_RECLAIM.
However scrub_nocow_workers does not need WQ_MEM_RECLAIM flags.

Did you mean this?

If you didn''t mean this, would you please tell me why the
WQ_MEM_RECLAIM
is not
needed?

Thanks

Qu>
> * WQ_FREEZABLE should be probably set
>
>
> david
> --
> To unsubscribe from this list: send the line "unsubscribe
linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
-----------------------------------------------------
Qu Wenruo
Development Dept.I
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, 210012, China
TEL: +86+25-86630566-8526
COINS: 7998-8526
FAX: +86+25-83317685
MAIL: quwenruo@cn.fujitsu.com
-----------------------------------------------------

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Sterba

2013-Oct-01 14:50 UTC

head link

Re: [PATCH v2 0/9] btrfs: Replace the btrfs_workers with kernel workqueue

On Fri, Sep 20, 2013 at 02:13:08PM +0800, Qu Wenruo
wrote:> >* WQ_MEM_RECLAIM for the scrub thread does not seem right
> I think scrub_workers,scrub_wr_completion_workers still need
WQ_MEM_RECLAIM.
> However scrub_nocow_workers does not need WQ_MEM_RECLAIM flags.
> 
> Did you mean this?
> 
> If you didn''t mean this, would you please tell me why the
WQ_MEM_RECLAIM is
> not
> needed?
Documentation says that threads that might be used in the memory reclaim
path must use this flag, but I don''t see how this applies to scrub
threads. They''re not writing out dirty data, though they may issue a
write, but that''s not their main purpose.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Qu Wenruo

2013-Oct-02 01:50 UTC

head link

Re: [PATCH v2 0/9] btrfs: Replace the btrfs_workers with kernel workqueue

On Tue, 1 Oct 2013 16:50:50 +0200, David Sterba wrote:> On Fri, Sep 20, 2013 at 02:13:08PM +0800, Qu Wenruo wrote:
>>> * WQ_MEM_RECLAIM for the scrub thread does not seem right
>> I think scrub_workers,scrub_wr_completion_workers still need
WQ_MEM_RECLAIM.
>> However scrub_nocow_workers does not need WQ_MEM_RECLAIM flags.
>>
>> Did you mean this?
>>
>> If you didn''t mean this, would you please tell me why the
WQ_MEM_RECLAIM is
>> not
>> needed?
> Documentation says that threads that might be used in the memory reclaim
> path must use this flag, but I don''t see how this applies to scrub
> threads. They''re not writing out dirty data, though they may issue
a
> write, but that''s not their main purpose.
>Thanks for your explain.

I understand now, and will remove the WQ_MEM_RECLAIM flagsin the recent 
V3 patches.

-- 
-----------------------------------------------------
Qu Wenruo
Development Dept.I
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
No. 6 Wenzhu Road, Nanjing, 210012, China
TEL: +86+25-86630566-8526
COINS: 7998-8526
FAX: +86+25-83317685
MAIL: quwenruo@cn.fujitsu.com
-----------------------------------------------------

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Btrfs devel - Sep 2013 - [PATCH v2 0/9] btrfs: Replace the btrfs_workers with kernel workqueue

[PATCH v2 0/9] btrfs: Replace the btrfs_workers with kernel workqueue

[PATCH v2 1/9] btrfs: Cleanup the unused struct async_sched.

[PATCH v2 2/9] btrfs: use kernel workqueue to replace the btrfs_workers functions

[PATCH v2 3/9] btrfs: Added btrfs_workqueue_struct implemented ordered execution based on kernel workqueue

[PATCH v2 4/9] btrfs: Add high priority workqueue support for btrfs_workqueue_struct

[PATCH v2 5/9] btrfs: Use btrfs_workqueue_struct to replace the fs_info->workers

[PATCH v2 6/9] btrfs: Use btrfs_workqueue_struct to replace the fs_info->delalloc_workers

[PATCH v2 7/9] btrfs: Replace the fs_info->submit_workers with kernel workqueue.

[PATCH v2 8/9] btrfs: Cleanup the old btrfs workqueue

[PATCH v2 9/9] btrfs: Replace thread_pool_size with workqueue default value

Re: [PATCH v2 0/9] btrfs: Replace the btrfs_workers with kernel workqueue

Re: [PATCH v2 2/9] btrfs: use kernel workqueue to replace the btrfs_workers functions

Re: [PATCH v2 2/9] btrfs: use kernel workqueue to replace the btrfs_workers functions

Re: [PATCH v2 9/9] btrfs: Replace thread_pool_size with workqueue default value

Re: [PATCH v2 0/9] btrfs: Replace the btrfs_workers with kernel workqueue

Re: [PATCH v2 9/9] btrfs: Replace thread_pool_size with workqueue default value

Re: [PATCH v2 9/9] btrfs: Replace thread_pool_size with workqueue default value

Re: [PATCH v2 0/9] btrfs: Replace the btrfs_workers with kernel workqueue

Re: [PATCH v2 0/9] btrfs: Replace the btrfs_workers with kernel workqueue

Re: [PATCH v2 0/9] btrfs: Replace the btrfs_workers with kernel workqueue